top of page

  2024

TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading [PDF]

Kun Wu*, Jeongmin Brian Park*, Xiaofan Zhang*, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu (*equal contributors)

arXiv:2408.10013, 2024


ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization [PDF]

Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Lin

38th Annual Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada, Dec. 2024


New Solutions on LLM Acceleration, Optimization, and Application [PDF]

[Invited] Yingbing Huang, Jiaxin Wan, Hanchen Ye, Manvi Jha, Jinghua Wang, Yuhong Li, Xiaofan Zhang, Deming Chen

61st Design Automation Conference (DAC), San Francisco, CA, June 2024


AutoAI2C: An Automated Hardware Generator for DNN Acceleration on both FPGA and ASIC [PDF]

Yongan Zhang, Xiaofan Zhang, Pengfei Xu, Yang Zhao, Cong Hao, Yue Wang, Chaojian Li, Deming Chen, Yingyan Lin

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2024


Software/Hardware Co-design for LLM and Its Application for Design Verification [PDF]

Jiaxin Wan, Yingbing Huang, Yuhong Li, Hanchen Ye, Jinghua Wang, Xiaofan Zhang, Deming Chen

29th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2024

HomeSGN: A Smarter Home with Novel Rule Mining Enabled by a Scorer-Generator GAN

Zehua Yuan, Junhao Pan, Xiaofan Zhang, Deming Chen

29th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2024

  2023

Compilation and Optimizations for Efficient Machine Learning on Embedded Systems [PDF]

Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li, Deming Chen

Book chapter in Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Software Optimizations and Hardware/Software Co-design, Springer Nature


Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization [PDF]

Clemens Schaefer, Navid Lambert-Shirzad, Xiaofan Zhang, Chiachen Chou, Tom Jablin, Jian Li, Elfie Guo, Caitlin Stanton, Siddharth Joshi, Yu Emma Wang

arXiv preprint arXiv:2306.04879, 2023

EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search [PDF]

Qian Jiang*, Xiaofan Zhang*, Deming Chen, Minh N. Do, Raymond A. Yeh (*equal contributors)

40th International Conference on Machine Learning (ICML) Workshop on Differentiable Almost Everything, July 2023

  2022

YouHome System and Dataset: Making Your Home Know You Better    

[Invited] Junhao Pan, Zehua Yuan, Xiaofan Zhang, Deming Chen

IEEE International Symposium on Smart Electronic Systems (iSES), Dec. 2022


Algorithm/Accelerator Co-Design and Co-Search for Edge AI [PDF]

Xiaofan Zhang, Yuhong Li, Junhao Pan, Deming Chen

IEEE Transactions on Circuits and Systems II, 2022


Exploring HW/SW Co-Design for Video Analysis on CPU-FPGA Heterogeneous Systems [PDF]

Xiaofan Zhang, Yuan Ma, Jinjun Xiong, Wen-mei Hwu, Volodymyr Kindratenko, Deming Chen

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol. 41, pp. 1606-1619, 2022


AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models [PDF]

Xiaofan Zhang, Zongwei Zhou, Deming Chen, Yu Emma Wang

arXiv preprint: 2201.08539, 2022

  2021

F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding [PDF]

Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, Yuecheng Li

58th Design Automation Conference (DAC), San Francisco, CA, Dec. 2021


Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition [PDF]

Saranyu Chattopadhyay, Florian Lonsing, Luca Piccolboni, Deepraj Soni, Peng Wei, Xiaofan Zhang, Yuan Zhou, Luca Carloni, Deming Chen, Jason Cong, Ramesh Karri, Zhiru Zhang, Caroline Trippel, Clark Barrett, Subhasish Mitra

21st Formal Methods in Computer-Aided Design (FMCAD), New Haven, CT, Oct. 2021

Exploring HW/SW Co-Optimizations for Accelerating Large-scale Texture Identification on Distributed GPUs [PDF]

Junsong Wang, Xiaofan Zhang, Yubo Li, Yonghua Lin

50th International Conference on Parallel Processing (ICPP), Lemont, IL, Aug. 2021 (Virtual)


Efficient Methods for Mapping Neural Machine Translator on FPGAs [PDF]

Qin Li*, Xiaofan Zhang*, Jinjun Xiong, Wen-mei Hwu, Deming Chen (*equal contributors)

IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 32, pp. 1866 - 1877, 2021

Being-ahead: Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment [PDF]

Xiaofan Zhang, Hanchen Ye, Deming Chen

Conference on Machine Learning and Systems (MLSys) workshop on MLBench, Apr. 2021 (Virtual)

  2020

DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator [PDF]

 

Xiaofan Zhang*, Hanchen Ye*, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, Deming Chen (*equal contributors)

 

39th International Conference on Computer Aided Design (ICCAD), San Diego, CA, Nov. 2020 (Virtual)

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices [PDF]

[Invited] Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, Jinjun Xiong, Wen-mei Hwu, Deming Chen

30th ACM Great Lakes Symposium on VLSI (GLSVLSI), Sep. 2020 (Virtual)

A-QED Verification of Hardware Accelerators [PDF]

Eshan Singh, Florian Lonsing, Saranyu Chattopadhyay, Max Strange, Peng Wei, Xiaofan Zhang, Yuan Zhao, Jason Cong, Deming Chen, Zhiru Zhang, Priyankja Raina, Clark Barrett, and Subhasish Mitra

57th Design Automation Conference (DAC), San Francisco, CA, July 2020 (Virtual)

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions [PDF]

Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen-mei Hwu, Deming Chen

57th Design Automation Conference (DAC), San Francisco, CA, July 2020 (Virtual)

HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation [PDF]

Hanchen Ye,Xiaofan Zhang, Zhize Huang, Gengsheng Chen, Deming Chen

57th Design Automation Conference (DAC), San Francisco, CA, July 2020 (Virtual)

:

SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems [PDF]

🏆 DAC'19 System Design Contest Champion Design

Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Conference on Machine Learning and Systems (MLSys), Mar. 2020

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs [PDF]

Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin

28th International Symposium on Field-Programmable Gate Arrays (FPGA), Seaside, CA, Feb. 2020

2019

T-DLA: An Open-source Deep Learning Accelerator for Ternarized DNN Models on Embedded FPGA [PDF]

Yao Chen, Kai Zhang, Cheng Gong, Cong Hao, Xiaofan Zhang, Tao Li, Deming Chen

IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Miami, FL, July 2019

uL2Q: An Ultra-Low Loss Quantization Method for DNN Compression

Cheng Gong, Tao Li, Ye Lu, Cong Hao, Xiaofan Zhang, Deming Chen, Yao Chen

The International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, July 2019

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection [PDF]

Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

(Technical report) arXiv preprint: 1906.10327, June 2019
 

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices [PDF]

🏆 Best Poster Award

Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-mei Hwu, Deming Chen

36th International Conference on Machine Learning (ICML) Workshop on ODML-CDNNR, Long Beach, CA, June 2019

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge [PDF]

Cong Hao*, Xiaofan Zhang*, Yuhong Li, Sitao Huang, Jinjun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen 

(*equal contributors)

56th Design Automation Conference (DAC), Las Vegas, NV, June 2019

Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs [PDF]

 

Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, Deming Chen

 

27th International Symposium on Field-Programmable Gate Arrays (FPGA), Seaside, CA, Feb. 2019

 

SiamVGG: Visual Tracking using Deeper Siamese Networks [PDF]

 

Yuhong Li, Xiaofan Zhang

 

arXiv preprint:1902.02804, Feb. 2019

Implementing Neural Machine Translation with Bi-Directional GRU and Attention Mechanism on FPGAs Using HLS [PDF]

 

Qin Li*, Xiaofan Zhang*, Jinjun Xiong, Wen-mei Hwu, Deming Chen (*equal contributors)

24th Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo, Japan, Jan. 2019 

2018

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs [PDF]

🏆 Best Paper Award

 

Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, Deming Chen

 

37th International Conference on Computer Aided Design (ICCAD), San Diego, CA, Nov. 2018

Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA [PDF]

 

Junsong Wang, Qiuwen Lou, Xiaofan Zhang, Chao Zhu, Yonghua Lin, Deming Chen

 

28th International Conference on Field-Programmable Logic and Applications (FPL), Dublin, Ireland, Aug. 2018

 

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes [PDF]

 

Yuhong Li, Xiaofan Zhang, Deming Chen

 

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, June 2018

 

Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs [PDF]

 

Chuanhao Zhuge, Xinheng Liu, Xiaofan Zhang, Sudeep Gummadi, Jinjun Xiong and Deming Chen

 

28th ACM Great Lakes Symposium on VLSI (GLSVLSI), Chicago, IL, May 2018

 

AccDNN: an IP-based DNN Generator for FPGAs

 

Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, Deming Chen

 

26th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Boulder, CO, April 2018

 

2017

An Energy Efficient Approach for C4.5 Algorithm using OpenCL Design Flow [PDF]

 

Hai Peng, Xiaofan Zhang, Letian Huang

 

16th International Conference on Field-Programmable Technology (FPT), Melbourne, Australia, December 2017

 

Machine Learning on FPGAs to Face the IoT Revolution [PDF]

 

[Invited] Xiaofan Zhang*, Anand Ramachandran*, Chuanhao Zhuge*, Di He, Wei Zuo, Zuofu Cheng, Kyle Rupnow, Deming Chen (*equal contributors)

 

36th International Conference on Computer Aided Design (ICCAD), Irvine, CA, November 2017

 

High-Performance Video Content Recognition with Long-term Recurrent Convolutional Network for FPGA [PDF]

 

Xiaofan ZhangXinheng Liu,  Anand Ramachandran, Chuanhao Zhuge, Shibin Tang, Peng Ouyang, Zuofu Cheng, Kyle Rupnow, Deming Chen

 

27th International Conference on Field-Programmable Logic and Applications (FPL), Ghent, Belgium, September 2017

  2016

Tolerating transient illegal turn faults in NoCs [PDF]

Letian Huang, Xiaofan Zhang, Masoumeh Ebrahimi, Guangjun Li

Microprocessors and Microsystems, Vol. 43, pp. 104-115, 2016

Non-Blocking Testing for Network-on-Chip [PDF]

Letian Huang, Junshi Wang, Masoumeh Ebrahimi, Masoud Daneshtalab, Xiaofan Zhang, Guangjun Li, Axel Jantsch​

IEEE Transactions on Computers, Vol. 65, pp. 679-692, 2016
 

  2015

Fault-Resilient Routing Unit in NoCs [PDF]

Xiaofan Zhang, Ebrahimi Masoumeh, Letian Huang, Guangjun Li

28th IEEE International System-on-Chip Conference (SOCC), Beijing, China, September 2015

A Network-Level Solution for Fault Detection, Masking, and Tolerance in NoCs [PDF]

Xiaofan Zhang, Ebrahimi Masoumeh, Letian Huang, Guangjun Li, Axel Jantsch​

 

23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), Turku, Finland, March 2015

bottom of page