On October 18, 2022, the 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA) officially released the list of papers accepted. The Organizing Committee of the Conference received 364 papers in total from the world's top research institutions and universities, among which 91 papers were accepted, and the reception rate was as low as 25%. The paper titled “Realizing Extreme Endurance Through Fault-aware Wear Leveling and Improved Tolerance” by the research team of Dr. Yu Wang and Dr. Huazhong Yang from the Institute of Circuits and Systems of our department, in collaboration with Dr. Alex K. Jones from the University of Pittsburgh and Dr. Donald Kline Jr. from Intel Corp., was successfully accepted. The first author of this paper is Dr. Jiangwei Zhang, who is an assistant researcher of our department, and the corresponding author is Dr. Yu Wang. The main collaborators also include Chong Wang and Zhenhua Zhu, doctoral students of the Department of Electronic Engineering.
RETROFIT：Fault-aware row-level wear levelling, incorporated with Page Protecting Pointers (PPPs) redesigned from retired rows, can increase memory life by 2.6 to 16.0 times, compared with the state-of-the-art method.
Phase Change Memory (PCM) and Resistive Random Access Memory (RRAM) have great potential to replace traditional memory, due to their high density and high bandwidth. However, both PCM and RRAM are affected by limited write tolerance. Wear leveling (WL) technology is critical to extend the lifetime of these memories and avoid early hard faults. In addition to WL, spare row and targeted fault correction methods can further extend lifetime after hard faults occur. However, as technology scaling intensifies process variation, the existing WL technology is not sufficient to cope with higher process variation.
Figure 1 Wear Leveling Technology (From https://www.transcend-info.com/Embedded/Essay-22)
In order to solve the problems above, the research team of Dr. Yu Wang and Dr. Huazhong Yang proposed novel fault-aware WL schemes to allocate write frequencies according to the strength of the rows and handle the imbalance of writes in columns. This work uses runtime detection schemes to identify weak rows and protects them prior to wear out. In particular, row-level WL, aka RETROFIT, leverages the spare rows provided for redundancy to be used strategically to guard against early cell wear out. RETROFIT is compatible with error correction schemes that guarantee to mitigate hard faults and error-correcting codes (ECC). Rather than discard retired rows, when any spare row completely replaces a retired row, this work retargets the retired row to assist with column sparing. It becomes a group of Page Protecting Pointers (PPPs), which utilizes otherwise discarded error correction potential to further enhance the leveling ability of RETROFIT. To relieve column-level imbalance, the work applies idle error correction bits before they are used to reduce average bit flips. The evaluation demonstrates that RETROFIT and enhanced RETROFIT with the PPPs improve lifetime by as much as 0.64× and 5.4× in the average case, respectively, over state-of-the-art row-level method while also reducing area overhead. In the worst-case scenario, these improvements further increase to 2.6× and 16.0×. Combined with the proposed column-level WL, enhanced RETROFIT realizes an overall 1.5× memory lifetime improvement over the perfectly uniform wear-leveling with equal storage overhead.
Fig. 2 Asymmetric wear leveling method RETROFIT is designed according to fault distribution
Figure 3 The retired row is redesigned as a set of page protection pointers
The research team of Dr. Yu Wang and Dr. Huazhong Yang from the Department of Electronic Engineering, Tsinghua University, has been committed to the research of non-volatile memory-based storage and in-memory computing architecture for many years. The team has published dozens of papers in top international academic conferences and journals, such as ISCA/HPCA/DAC/TCAD. Relevant research achievements have received support and attention from government-sponsored research organizations such as the National Natural Science Foundation of China as well as many leading enterprises in the industry.
HPCA：Wind vane for the development of high-performance computing and chip industry
HPCA is a top international academic conference in the field of high-performance computing and chips. The review process of the conference is extremely strict, and the selection standard of papers is equally high. Therefore, the papers included in HPCA have always been of high academic and industrial value. It is reported that major enterprises such as Intel, Nvidia, Google, Samsung, and AMD, have published research results at the conference, which has guided the direction of industry development. The papers selected at the conference fully demonstrate our innovative research ability in the field of high-performance computing and chips.
Figure 4 The IEEE HPCA will be held from February 25 to March 1, 2023
If you have interest in this work, please feel free to contact Dr. Jiangwei Zhang (E-mail:email@example.com).