DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on March 26, 2021 and November 09, 2021 is/are in compliance with the provisional of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claims 10, 11, 13, 16, and 17 objected to because of the following informalities: The claims recite “The method of claim 7”, claim 7 is a memory module and not a method, the method independent claim is claim 8. For examining purposes examiner shall interpret it to read “The method of claim 8”. Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1 and 3-6 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang et al. (US 10,185,499) (hereinafter Wang) (published January 22, 2019).
Regarding Claim 1, Wang discloses a memory module comprising: one or more memory devices; and a near-memory computing module coupled to the one or more memory devices, the near-memory computing module comprising: one or more processing elements configured to process data from the one or more memory devices; and a memory controller configured to coordinate access of the one or more memory devices from a host and the one or more processing elements.
“CDIMM 502 comprises a near-memory compute module 510 coupled to a first set of DRAM devices 519.sub.1, a second set of DRAM devices 519.sub.2, and a flash memory device 520 (e.g., comprising a single flash memory device or multiple flash memory devices). Near-memory compute module 510 further comprises a transaction processor 530, a set of processor cores 514, a set of off-load engines 515, a first memory controller 516, a second memory controller 517, and a flash controller 518, wherein transaction processor 530 includes a set of data buffers 511, a data plane memory (e.g., within data plane module 512), and a control plane memory (e.g., within control plane module 513)” (Wang Col 15 lines 10-30)

Regarding Claim 3, Wang further discloses wherein the near-memory computing module is configured to control a first one of the one or more memory devices based on a first chip select signal and a second one of the one or more memory devices based on a second chip select signal.
“Host controller 102 can then issue a bank activate command (ACT) 203.sub.2 to near-memory compute module 120.sub.1. The combination of signals (e.g., chip select, bankgroup select, bank select, row address, etc.) provided by host controller 102 on system bus 110 during bank activate 203.sub.2 will, in part, cause near-memory compute module 120.sub.1 to either forward ACT command 212.sub.1 from host controller 102 directly to DIMM 108.sub.1, or initialize module 213.sub.1 for subsequent activity (e.g., receiving instructions and data for transactions or operations). If DIMM 108.sub.1 receives the forwarded bank activate command 203.sub.2 (e.g., from forward ACT 212.sub.1), DIMM 108.sub.1 will select the bank and row 233.sub.2 specified in bank activate command 203.sub.2” (Wang Col 8 lines 5-20)

Regarding Claim 4, Wang further discloses wherein the one or more memory devices and the near-memory computing module are arranged as a first rank, the memory module further comprising: a second rank; and a hierarchical bus structure configured to transfer data between the first rank and the second rank.
“A common form of pluggable memory is the Dual In-line Memory Module (DIMM). DIMMs can contain multiple memory chips (e.g., Dynamic Random Access Memory or DRAM chips), each of which has a particular data bus width (e.g., 4 bits or 8 bits). For example, a DIMM may have eight 8-bit DRAM chips, or sixteen 4-bit DRAM chips, arranged in parallel to provide a total 64-bit wide data bus. Each arrangement of 64 data bits from parallel DRAM chips is called a rank” (Wang Col 1 lines 25-35)

“Host controller 102 is coupled to CDIMMs 122.sub.1 and 122.sub.2 through system bus 110 and standard physical interface 111 as in environment 1A00. Host controller 102 can also continue to write and read datasets to and from memory chips 109.sub.1 and 109.sub.2. Near-memory compute modules 120.sub.1 and 120.sub.2 in environment 1B00 can offer additional capabilities beyond that of the system in environment 1A00” (Wang Col 5 line 60 to Col 6 line 5)

Regarding Claim 5, Wang further discloses wherein: the one or more memory devices comprise one or more first memory devices, the near-memory computing module comprises a first near-memory computing module, and the one or more processing elements comprise one or more first processing elements; and the second rank comprises: one or more second memory devices; and a second near-memory computing module coupled to the one or more second memory devices, the second near-memory computing module comprising: one or more second processing elements configured to process data from the one or more second memory devices; and a second memory controller configured to coordinate access of the one or more second memory devices from a host and the one or more second processing elements.
“Host controller 102 is coupled to CDIMMs 122.sub.1 and 122.sub.2 through system bus 110 and standard physical interface 111 as in environment 1A00. Host controller 102 can also continue to write and read datasets to and from memory chips 109.sub.1 and 109.sub.2. Near-memory compute modules 120.sub.1 and 120.sub.2 in environment 1B00 can offer additional capabilities beyond that of the system in environment 1A00” (Wang Col 5 line 60 to Col 6 line 5)

“CDIMM 502 comprises a near-memory compute module 510 coupled to a first set of DRAM devices 519.sub.1, a second set of DRAM devices 519.sub.2, and a flash memory device 520 (e.g., comprising a single flash memory device or multiple flash memory devices). Near-memory compute module 510 further comprises a transaction processor 530, a set of processor cores 514, a set of off-load engines 515, a first memory controller 516, a second memory controller 517, and a flash controller 518, wherein transaction processor 530 includes a set of data buffers 511, a data plane memory (e.g., within data plane module 512), and a control plane memory (e.g., within control plane module 513)” (Wang Col 15 lines 10-30)

Regarding Claim 6, Wang further discloses wherein: the memory module further comprises a hierarchical bus structure; and the near-memory computing module further comprises: an input buffer coupled between the hierarchical bus structure and the one or more processing elements; and an output buffer coupled between the hierarchical bus structure and the one or more processing elements.
“CDIMM 502 drives and receives signals to and from system bus 110, respectively, at data buffers 511. Data buffers 511 can provide buffering to boost the drive of the signals (e.g., DQ, CA, RA, etc.) on system bus 110 to help mitigate high electrical loads of large computing and memory systems” (Wang Col 15 lines 35-45)


Claim(s) 8 and 18 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Barber et al. (US 2013/0138923) (hereinafter Barber) (published May 30, 2013).
Regarding Claim 8, Barber discloses a method of processing a dataset, the method comprising: distributing a first portion of the dataset to a first memory module; distributing a second portion of the dataset to a second memory module;
“Threads may be configured to receive equally divided sets of input data elements, which may be referred to herein as a "partial input data set" (IDS)” (Barber [0019])

constructing a first local data structure at the first memory module based on the first portion of the dataset; constructing a second local data structure at the second memory module based on the second portion of the dataset; and merging the first local data structure and the second local data structure.
“In order to avoid performance degradation, many software systems having a shared-nothing architecture (e.g., Hadoop.RTM.) use another method, wherein each thread builds its own local data structure from its associated IDS, and then, merges the contents of the local data structure into the global data structure. Hadoop.RTM. is a registered trademark of the Apache Software Foundation” (Barber [0022] also see paragraphs [0030-0031] and fig. 3)

Regarding Claim 18, Barber discloses a system comprising: a first memory module configured to construct a first local data structure based on a first portion of a dataset; a second memory module configured to construct a second local data structure based on a second portion of the dataset; and
“In order to avoid performance degradation, many software systems having a shared-nothing architecture (e.g., Hadoop.RTM.) use another method, wherein each thread builds its own local data structure from its associated IDS” (Barber [0022] also see paragraphs [0030-0031] and fig. 3)

a host coupled to the first memory module and the second memory module through one or more memory channels, wherein the host is configured to: distribute the first portion of the dataset to the first memory module; distribute the second portion of the dataset to the second memory module; and
“Threads may be configured to receive equally divided sets of input data elements, which may be referred to herein as a "partial input data set" (IDS)” (Barber [0019])

merge the first local data structure and the second local data structure.
“merges the contents of the local data structure into the global data structure. Hadoop.RTM. is a registered trademark of the Apache Software Foundation” (Barber [0022] also see paragraphs [0030-0031] and fig. 3)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang (published January 22, 2019) as applied to claim 1 above, and further in view of Menezes et al. (US 10,303,662) (hereinafter Menezes) (published May 28, 2019).
Regarding Claim 2, Wang disclosed the memory module of claim 1 but does not explicitly state wherein at least one of the one or more processing elements is configured to process data from the one or more memory devices by performing a counting operation on the data.
Menezes discloses wherein at least one of the one or more processing elements is configured to process data from the one or more memory devices by performing a counting operation on the data.
“If one bloom filter with bitmap BF is created per file F using the method described above, the individual bitmaps for the files in the ad-hoc set can be used as building blocks in order to efficiently estimate the number of unique segments inside any set of files {F1, F2, . . . , FN}. In this case, if two or more files share segments across them, each segment will be counted only once, so an estimate can be made as to how much physical space has actually been written. Moreover, and as indicated by process 606 of FIG. 3, two or more bloom filters for a number N of files can be readily combined by using a bitwise or operation on the bloom filter bitmaps. For example, given the bitmaps for file F1 and file F2, those bitmaps can be combined thus: B{F1, F2}=BF1\BF2” (Menezes Col 10 lines 35-50; the data is processed with the bloom filter for counting each segment)

It would have been obvious before the effective filing date of the invention to one of ordinary skill in the art to combine the processing of data by counting in Menezes with Wang to yield the predictable results of being able to reduce space having less redundancy and knowing what the unique segments are.

Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang (published January 22, 2019) as applied to claim 1 above, and further in view of Ramesh et al. (US 2009/0228684) (hereinafter Ramesh) (published September 10, 2019).
Regarding Claim 7, Wang disclosed the memory module of claim 1 but does not explicitly state wherein the near-memory computing module further comprises a workload monitor configured to balance a first workload of a first one of the one or more processing elements and a second workload of a second one of the one or more processing elements.
Ramesh discloses wherein the near-memory computing module further comprises a workload monitor configured to balance a first workload of a first one of the one or more processing elements and a second workload of a second one of the one or more processing elements.
“With multiple cores, cores may be allocated and reallocated at run-time to optimize for performance based on the load balancing on these core workloads” (Ramesh [0023])

It would have been obvious before the effective filing date of the invention to one of ordinary skill in the art to combine the load balancing in Ramesh with Wang to yield the predictable results of a more optimized system by having work spread out between cores and not having certain cores overloaded.

Claim 9 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang (published January 22, 2019) as applied to claims 8 and 18 above, and further in view of Menezes (published May 28, 2019).
Regarding Claim 9, Wang disclosed the method of claim 8 and further discloses wherein: merging the first local data structure and the second local data structure forms a merged data structure; and
“In order to avoid performance degradation, many software systems having a shared-nothing architecture (e.g., Hadoop.RTM.) use another method, wherein each thread builds its own local data structure from its associated IDS, and then, merges the contents of the local data structure into the global data structure. Hadoop.RTM. is a registered trademark of the Apache Software Foundation” (Barber [0022])

But does not explicitly state the method further comprises performing a counting operation on the merged data structure at the first memory module and the second memory module.
Menezes discloses the method further comprises performing a counting operation on the merged data structure at the first memory module and the second memory module.
“If one bloom filter with bitmap BF is created per file F using the method described above, the individual bitmaps for the files in the ad-hoc set can be used as building blocks in order to efficiently estimate the number of unique segments inside any set of files {F1, F2, . . . , FN}. In this case, if two or more files share segments across them, each segment will be counted only once, so an estimate can be made as to how much physical space has actually been written. Moreover, and as indicated by process 606 of FIG. 3, two or more bloom filters for a number N of files can be readily combined by using a bitwise or operation on the bloom filter bitmaps. For example, given the bitmaps for file F1 and file F2, those bitmaps can be combined thus: B{F1, F2}=BF1\BF2” (Menezes Col 10 lines 35-50; the files would be equivalent to the local data structures and the counting is done via bloom filter per file)

It would have been obvious before the effective filing date of the invention to one of ordinary skill in the art to combine the processing of data by counting in Menezes with Wang to yield the predictable results of being able to reduce space having less redundancy and knowing what the unique segments are.

Regarding Claim 20, Wang disclosed the system of claim 18 but does not explicitly state wherein the first memory module is configured to perform a counting operation on the merged data structure.
Menezes discloses wherein the first memory module is configured to perform a counting operation on the merged data structure.
“If one bloom filter with bitmap BF is created per file F using the method described above, the individual bitmaps for the files in the ad-hoc set can be used as building blocks in order to efficiently estimate the number of unique segments inside any set of files {F1, F2, . . . , FN}. In this case, if two or more files share segments across them, each segment will be counted only once, so an estimate can be made as to how much physical space has actually been written. Moreover, and as indicated by process 606 of FIG. 3, two or more bloom filters for a number N of files can be readily combined by using a bitwise or operation on the bloom filter bitmaps. For example, given the bitmaps for file F1 and file F2, those bitmaps can be combined thus: B{F1, F2}=BF1\BF2” (Menezes Col 10 lines 35-50)

It would have been obvious before the effective filing date of the invention to one of ordinary skill in the art to combine the processing of data by counting in Menezes with Wang to yield the predictable results of being able to reduce space having less redundancy and knowing what the unique segments are.


Claim 11 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang (published January 22, 2019) as applied to claim 8 above, and further in view of Lee et al. (US 2004/0078544) (hereinafter Lee) (published April 22, 2004).
Regarding Claim 11, Wang disclosed the method of claim 8 but does not explicitly state further comprising distributing the first portion of the dataset to two or more memory devices at the first memory module.
Lee discloses further comprising distributing the first portion of the dataset to two or more memory devices at the first memory module.
“In order to obtain different outputs with each other in the output queue, the key transition bit 405 can be chosen from each first-entry of the first input queue to replace the stable bit of each first-output in the first output queue. Therefore, a plurality of second-outputs are obtained and form a second output queue, and obviously, each second-output is different from the subsequent second-output. The goal of distributing the data of cache to pages of different banks is achieved efficiently” (Lee [0063])

It would have been obvious before the effective filing date of the invention to one of ordinary skill in the art to combine the distribution of data to different banks in Lee with Wang to yield the predictable results of more efficient access to the storage by utilizing multiple banks. “Therefore, the goal of distributing pages into different memory banks is achieved. Moreover, not only memory page hit rate is increased by normally distributing pages into different memory banks, but also the localization of each memory address is reserved longer than that by the existing method” (Lee [0028])

Regarding Claim 12, Lee further discloses further comprising distributing the first portion of the dataset to two or more ranks at the first memory module.
“In order to obtain different outputs with each other in the output queue, the key transition bit 405 can be chosen from each first-entry of the first input queue to replace the stable bit of each first-output in the first output queue. Therefore, a plurality of second-outputs are obtained and form a second output queue, and obviously, each second-output is different from the subsequent second-output. The goal of distributing the data of cache to pages of different banks is achieved efficiently” (Lee [0063])


Claim 14 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang (published January 22, 2019) as applied to claim 8 above, and further in view of Liao (US 2016/0299874) (hereinafter Liao) (published October 13, 2016).
Regarding Claim 11, Wang disclosed the method of claim 8 but does not explicitly state further comprising interleaving memory accesses of the first portion of the dataset between a first task and a second task.
Liao discloses further comprising interleaving memory accesses of the first portion of the dataset between a first task and a second task.
“Other techniques that may be used to optimize memory accesses may relate to changing how memory is accessed before performing a given task. The effects of hot spots in memory when performing reductions or back-transformations may be mitigated by accessing global memory in an interleaved fashion. By interleaving memory accesses a first processor may access a first memory element (or a first portion of memory) at one moment in time while a second processor accesses a second different memory element (or a second portion of memory). Subsequently, the second processor may access the memory the first portion of memory and the first processor may access the second portion of memory” (Liao [0148])

It would have been obvious before the effective filing date of the invention to one of ordinary skill in the art to combine interleaving of memory accesses in Liao with Wang to yield the predictable results a more efficient memory by not having to stall while waiting for the memory to not be busy.

Regarding Claim 15, Lee further discloses further comprising switching between the first task and the second task between memory accesses of the first portion of the dataset.
“Other techniques that may be used to optimize memory accesses may relate to changing how memory is accessed before performing a given task. The effects of hot spots in memory when performing reductions or back-transformations may be mitigated by accessing global memory in an interleaved fashion. By interleaving memory accesses a first processor may access a first memory element (or a first portion of memory) at one moment in time while a second processor accesses a second different memory element (or a second portion of memory). Subsequently, the second processor may access the memory the first portion of memory and the first processor may access the second portion of memory” (Liao [0148])


Claim 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang (published January 22, 2019) as applied to claim 8 above, and further in view of LAYER et al. (US 2016/0132640) (hereinafter Layer) (published May 12, 2016).
Regarding Claim 17, Wang disclosed the method of claim 8 but does not explicitly state wherein: the dataset comprises a genetic sequence; the first local data structure comprises a Bloom filter; and the Bloom filter comprises one or more k-mers of the genetic sequence.
Layer discloses wherein: the dataset comprises a genetic sequence; the first local data structure comprises a Bloom filter; and the Bloom filter comprises one or more k-mers of the genetic sequence.
“Next, the present inventors discuss creating k-mer profiles of a given species using Bloom filters 308 (although it should be appreciated that other types of probabilistic data structure may be used as desired or required). In an aspect of an embodiment, the method may include creating a k-mer profile 306 of a species's genome 304 by scanning its genomesequence and cataloging each distinct subsequence of length k. A guiding principle behind constructing k-mer profiles is that they directly reflect the DNA content of the species's genome” (Layer [0065])

It would have been obvious before the effective filing date of the invention to one of ordinary skill in the art to combine the use of bloom filter in Layer with Wang to yield the predictable results of being able to uniquely identify parts of the dataset. The use genetic sequence as the dataset is a design choice as it would not affect how the invention operates.


Allowable Subject Matter
Claims 10, 13, 16, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: 
The prior art discloses the creation of local data structures from data distributed that memory module and merger multiple local data structures together. The prior art however does not disclose nor suggest that the first local data structure and the second local data structure form a merged data structure and to scatter the merged data structure to the first memory module and the second memory module in combination with the other elements recited in the claims.
The prior art discloses the construction of local data structure by a processing element and have the data structure by accessible by multiple processing elements but does not disclose or suggest that a single local data structure is constructed by multiple processing elements and further to balance the workload of those elements in combination with the other elements recited in the claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY LI whose telephone number is (571)270-5967. The examiner can normally be reached Monday to Friday 10:00 AM to 6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on (571) 272-4085. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SIDNEY LI/Examiner, Art Unit 2136

/EDWARD J DUDEK  JR/Primary Examiner, Art Unit 2136