Song Han – Assistant Professor, MIT EECS

Vocal Han is an Associate Professor (starting July ane, 2022) at MIT’s EECS. He received his PhD caste from Stanford University. His research focuses on efficient deep learning computing. He proposed “deep pinch” technique that tin can reduce neural network size by an guild of magnitude without losing accurateness, and the hardware implementation “efficient inference engine” that first exploited pruning and weight sparsity in deep learning accelerators. His team’south work on hardware-aware neural architecture search (ProxylessNAS, One time-for-All Network (OFA), MCUNet) was integrated in Facebook, Amazon, Microsoft, Intel, SONY, received the outset identify in six low-ability computer vision contest awards in flagship AI conferences. Song received Best Newspaper awards at ICLR’xvi and FPGA’17, multiple faculty awards from Amazon, SONY, Facebook, NVIDIA and Samsung. Song was named “35 Innovators Under 35” past MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more than efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning” and the IEEE “AIs 10 to Sentinel: The Futurity of AI” award.

Group Website, Google Scholar, YouTube, Twitter, Github, LinkedIn

Inquiry Interests

  • TinyML and intelligent internet of things (IIoT), e
    fficient preparation and inference, model compression and acceleration:[MLSys’22][ICLR’22][NeurIPS’21][NeurIPS’21][MLSys’21][NeurIPS’twenty, spotlight][NeurIPS’xx][ICLR’twenty][CVPR’twenty][CVPR’twenty][ICLR’19][CVPR’nineteen, oral][ECCV’eighteen][ICLR’xvi, BP][NIPS’fifteen]
  • Efficient AI applications on edge devices (video / betoken deject / NLP / GAN): [CVPR’22][ICRA’21][CVPR’21][NeurIPS’20][ACL’xx][CVPR’20][ECCV’20][ICLR’20][NeurIPS’19, spotlight][ICCV’19]

  • Hardware accelerator for neural networks:[MICRO’21][HPCA’21][HPCA’20][FPGA’17, BP][ISCA’16]
  • Auto learning for hardware design, algorithm-hardware co-design: [DAC’21][DAC’20][NeurIPS’19 W]
  • Quantum AI System: [HPCA’22][DAC’22][DAC’22][TorchQuantum Library]

Our inquiry have influenced and landed in many industrial products: Intel OpenVino, Intel Neural Network Distiller, Apple tree Neural Engine, NVIDIA Sparse Tensor Core, AMD-Xilinx Vitis AI, Qualcomm AI Model Efficiency Toolkit (AIMET), Amazon AutoGluon, Facebook PyTorch,Microsoft NNI,
SONY Neural Architecture Search Library,
SONY Model Compression Toolkit,
  ADI MAX78000/MAX78002 Model Training and Synthesis Tool.


  • Samsung Global Research Outreach (GRO) Honor, 2021
  • IEEE “AIs ten to Picket: The Time to come of AI” Award, 2020
  • NSF CAREER Award, 2020
  • NVIDIA Bookish Partnership Award, 2020, 2021
  • MIT Technology Review list of 35 Innovators Under 35, 2022
  • SONY Kinesthesia Award, 2017/2022/2020
  • Amazon Machine Learning Research Award, 2022/2022
  • Facebook Research Award, 2022
  • Best paper award, FPGA’2017
  • All-time newspaper accolade, ICLR’2016

Competition Awards

  • Starting time place, 6th
    AI Driving Olympics,
    NuScenes Partition Claiming @ICRA 2021 [SPVNAS]

  • First identify, 5th
    Low-Power Computer Vision Challenge,
    CPU detection rails & FPGA track, Aug 2020 [OFA]

  • First place, 3D semantic division on
    July 2020

  • Showtime place, 4th
    Low-Ability Calculator Vision Claiming,
    both CPU classification and detection track, Jan 2020

  • First identify, third
    Depression-Power Computer Vision Challenge,
    DSP track, @ICCV 2022


  • First place,
    MicroNet Challenge,
    NLP runway (WikiText-103), @NeurIPS 2022


  • First place,
    Visual Wake Words Challenge,
    TF-light track, @CVPR 2022


[NeurIPS’20 spotlight][NeurIPS’21]:

– MIT News,
 Tiny motorcar learning design alleviates a bottleneck in retentiveness usage on internet-of-things devices

– Wired,
 AI Algorithms Are Slimming Downwards to Fit in Your Refrigerator

– MIT News
, Arrangement brings deep learning to “net of things” devices

– Morning Brew,

Researchers Figured Out How to Fit More than AI Than Ever onto Net of Things Microchips

– IBM,
New IBM-MIT organisation brings AI to microcontrollers – paving the way to ‘smarter’ IoT

– Analytics Insight,

Amalgamating Ml And Iot In Smart Dwelling Devices


– MIT Homepage Spotlight,

A linguistic communication learning system that pays attention — more than efficiently than ever before

– ALL About Circuits, Is Hardware the Key to Advancing Natural language Processing?

– Embedded Calculating Blueprint, MIT’s SpAtten Architecture Uses Attention Mechanism for Advanced NLP


– MIT News, Making breakthrough circuits more robust


– Venture Beat, MIT researchers merits augmentation technique can train GANs with less information

In one case-For-All Network

– Venture Vanquish, MIT aims for free energy efficiency in AI model training

– MIT News, Reducing the carbon footprint of artificial intelligence

– Qualcomm,
Enquiry from MIT shows promising results for on-device AI

– TechHQ,

How MIT is making ground towards ‘greener’ AI

– AI Daily, New MIT Architecture May Pb To Smaller Carbon Footprints For Neural Networks

– Inhabitat, MIT moves toward greener, more sustainable artificial intelligence

Hardware-Aware Transformer [ACL’20]:

– MIT News, Shrinking deep learning’due south carbon footprint

– Venture Beat, New AI technique speeds up language models on edge devices

Temporal Shift Module

– NVIDIA, New MIT Video Recognition Model Dramatically Improves Latency on Border Devices

– MIT Technology Review, Powerful reckoner vision algorithms are now modest enough to run on your phone

– Engadget, MIT-IBM developed a faster way to railroad train video recognition AI

– MIT News, Faster video recognition for the smartphone era


– IEEE Spectrum, Using AI to Make Meliorate AI

– MIT News, Kicking neural network pattern automation into high gear


We thank the generous sponsors of our research: ADI, Amazon, AMD, Apple, ARM, Cognex, Facebook, Ford, Google, Hyundai, IBM, Intel, Microsoft, MIT AI Hardware Program, MIT Microsystems Technology Lab, MIT-IBM Watson AI Lab, National Science Foundation, NVIDIA, Qualcomm, Samsung, Semiconductor Inquiry Corporation, SONY, TI.


  • Apr 2022:

    Network Augmentation for Tiny Deep Learning
    is presented at ICLR’22. newspaper / code

  • March 2022: A journal paper that summarizes our philosophies for mobile deep learning:
    Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

    We showtime present popular model pinch methods, including pruning, factorization, quantization, as well equally low-cal-weight primitives. To reduce the manual design price, we present the hardware-aware AutoML framework, including neural architecture search (ProxylessNAS, One time-for-all) and automated pruning (AMC) and quantization (HAQ). Nosotros and then encompass efficient on-device training to enable user customization based on the local information on mobile devices (TinyTL). Apart from general acceleration techniques, we too showcase several chore-specific accelerations for point deject, video, and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

    paper /lawmaking
  • March 2022:Low-cal Pose: Efficient Compages for Human Pose Estimation is accepted by CVPR’22. Light Pose accelerates human pose estimation past up to 5x on Snapdragon 855, Jetson Nano and Raspberry Pi.
  • Feb 2022:
    QuantumNAS: Dissonance-Adaptive Search for Robust Breakthrough Circuits

     is presented at HPCA 2022.paper /qmlsys website /lawmaking
  • Feb 2022:
    QuantumNAT: Quantum Dissonance-Aware Training with Racket Injection, Quantization and Normalization

     is accustomed by DAC’22. Due to the large quantum noises (errors), the functioning of quantum AI models has a severe degradation on real breakthrough devices. We nowadays QuantumNAT, a QNN-specific framework to perform dissonance-enlightened optimizations in both training and inference stages to improve robustness. We advise postal service-measurement normalization to mitigate the characteristic distribution differences between dissonance-gratis and noisy scenarios. Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to QNN according to realistic noise models of quantum hardware. Finally, mail service-measurement quantization is introduced to quantize the measurement outcomes to discrete values, achieving the denoising effect of quantum noise.paper /website /lawmaking
  • February 2022:QOC: Breakthrough On-Bit Preparation with Parameter Shift and Gradient Pruning

    is accustomed by DAC’22. In order to attain
     training of quantum AI models, the grooming process needs to be offloaded to real quantum machines instead of using exponential-cost classical simulators. One common approach to obtaining QNN gradients is parameter shift whose cost scales linearly with the number of qubits. Nosotros present QOC, the
     beginning experimental sit-in
     of practical on-flake parameterized quantum circuit training with parameter shift. Nevertheless, we find that due to the significant quantum errors (noises) on real machines, gradients obtained from naive parameter shift have low allegiance and thus dethrone the training accuracy. To this cease, nosotros further propose probabilistic slope pruning to firstly place gradients with potentially large errors and then remove them. Specifically, minor gradients have larger relative errors than large ones, thus having a college probability to be pruned.paper /website/code
  • Jan 2022:
    Network Augmentation for Tiny Deep Learning
    is accepted by ICLR’22.  Grooming tiny models are different from large models: rather than augmenting the data, we should augment the model, since tiny models tend to suffer from limited capacity. To alleviate this issue, NetAug augments the network (opposite dropout) instead of inserting noise into the dataset or the network. It puts the tiny model into larger models and encourages it to work as a sub-model of larger models to get extra supervision, in add-on to operation as an independent model. paper
  • Jan 2022:
    TorchSparse: Efficient Signal Cloud Inference Engine

    is accepted past MLSys’22. The sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently; existing sparse dispatch techniques for 2D images exercise not translate to 3D point clouds. Nosotros introduce TorchSparse, a high-performance point cloud inference engine that accelerates the thin convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution:information movement andirregular ciphering. Information technology optimizes the data orchestration by quantization and fused locality-aware memory access, reducing the memory movement cost past two.7x. It too adopts adaptive MM group to merchandise ciphering for improve regularity, achieving 1.four-1.5x speedup for matrix multiplication. newspaper / code

  • Dec 2021:
    MCUNet-v2: Retention Efficient Inference for Tiny Deep Learning

    is presented at NeurIPS’21. Tiny deep learning on microcontroller units (MCUs) is challenging due to the express retentiveness size. The imbalanced retentivity distribution CNN exacerbates the issue: the first several blocks take an order of magnitude larger memory usage than the rest of the network. We propose a patch-past-patch inference scheduling, which operates but on a small-scale spatial region of the feature map and significantly cuts downward the acme memory. However, naive implementation brings overlapping patches and computation overhead. We farther propose network redistribution to shift the receptive field and FLOPs to the after stage and reduce the computation overhead. MCUNetV2 sets a record ImageNet accuracy on MCU (71.viii%), and achieves >90% accurateness on the visual wake words dataset under just 32kB SRAM. MCUNetV2 too unblocks object detection on tiny devices, achieving 16.ix% higher mAP on Pascal VOC compared to the country-of-the-art result. Our study largely addressed the memory bottleneck in tinyML and paved the mode for diverse vision applications beyond nomenclature just also detection.  newspaper / website / slides / demo / demo2 / MIT News / TechTalks

  • Dec 2021:
    Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

     is presented at NeurIPS’21. Federated learning suffers from high communication latency. We advise Delayed Slope Averaging (DGA), which delays the averaging step to allow local ciphering run ahead of communication. We theoretically prove that DGA attains a similar convergence rate as FedAvg, and empirically evidence that our algorithm can tolerate high network latency without compromising accurateness. DGA is implemented on 16-node Raspberry Pi cluster. With both IID and non-IID partitions, and testify DGA can bring 2.55×
    to 4.07
    speedup. newspaper /website /slides /poster
  • Dec 2021:
    NAAS: Neural Accelerator Architecture Search

    is presented at DAC’21. We proposed a novel data-driven arroyo for AI-designed AI accelerator. Such data-driven method can find knowledge that is hard to be explicitly expressed by human and can efficiently scale. Blueprint spaces of hardware, compiler, and neural networks are tightly entangled, joint-optimization is better than carve up optimization. Given the huge design space, information-driven approach is desirable. With the same runtime, machine learning methods can explore more than data points than man designers. NAAS proposes a data-driven, automatic pattern infinite exploration of neural accelerator architectures that outperforms human design. paper / website / slides / video

  • Oct 2021:
    QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits

     to announced at HPCA’22. Quantum noise is the fundamental challenge in Noisy Intermediate-Calibration Breakthrough (NISQ) computers. We propose QuantumNAS (NAS: Noise-Adaptive Search)
    , a comprehensive framework for noise-adaptive co-search of the variational circuit and qubit mapping. QuantumNAS decouples the circuit search and parameter training by introducing a novel SuperCircuit, followed by evolutionary co-search of SubCircuit and its qubit mapping. Finally, we perform iterative gate pruning and finetuning to remove redundant gates and reduce noise. QuantumNAS is the starting time to demonstrate over 95% 2-class, 85% iv-class, and 32% x-form classification accuracy on real QC. Information technology also achieves the everyman eigenvalue for VQE tasks. We open up-source TorchQuantum for fast training of parameterized quantum circuits to facilitate future research.
    paper /qmlsys website /code
  • October 2021:
    PointAcc: Efficient Betoken Cloud Accelerator

    is presented at International Symposium on Microarchitecture (MICRO’21) .
    Deep learning on bespeak clouds plays a vital role in a wide range of applications such as autonomous driving. Compared to projecting the point cloud to 2d space, directly processing 3D point cloud yields college accuracy and lower #MACs. Still, the extremely sparse nature of point cloud poses challenges to hardware acceleration. PointAcc proposes an versatile sorting engine to determine the nonzero input-output pairs, streams the sparse computation with reconfigurable caching, and temporally fuses consecutive dense layers to reduce the memory footprint. Co-designed with calorie-free-weight neural networks, PointAcc rivals the prior art by 100X speedup with ix.1% higher accuracy for semantic sectionalisation. paper / website / slides / talk / lightning talk

  • July 2021:
    LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision

    is accepted by International Conference on Computer Vision (ICCV’21)newspaper
  • July 2021:
    SemAlign: Note-Gratuitous Camera-LiDAR Calibration with Semantic Alignment Loss

    is accustomed past International Conference on Intelligent Robots and Systems (IROS’21)paper
  • June 2021: Congrats Zhijian and Yujun receiving the Qualcomm Innovation Fellowship for “
    Algorithm-Hardware Co-Design for Efficient LiDAR-Based Democratic Driving” project

  • June 2021: Congrats Hanrui and Han receiving the Qualcomm Innovation Fellowship for “On-Device NLP Inference and Training with Algorithm-Hardware Co-Design” project
  • April 2021:
    Once-for-All (OFA) Network

    got a world-record in the open up division ofMLPerf Inference Benchmark: ane.078M inferences per second on 8 A100 GPUs
    . paper / website / Github

  • March 2021:
    HAQ: Hardware-Aware Automated Quantization with Mixed Precision

    is integrated pastIntel OpenVINO Toolkit.  paper

  • Feb 2021:
    Efficient and Robust LiDAR-Based Stop-to-End Navigation

     is accepted by
     ICRA’21. We introduce Fast-LiDARNet that is based on sparse GPU kernel optimization and hardware-aware neural architecture search, improving the speed from 5 fps to 47 fps; together with Hybrid Evidential Fusion that directly estimates the uncertainty and fuse the control predictions, which
    reduces the




    oad test. paper

  • Feb 2021:
    Anycost GANs for Interactive Image Synthesis and Editing

     is accepted by
     CVPR’21. GANs are large. GANs are slow. It takes seconds to edit a single on edge devices, prohibiting interactive user experience. Anycost GANs can be executed at various computational toll budgets (upwards to ten× ciphering reduction) and adapt to a broad range of hardware and latency requirements. When deployed on border devices, our model achieves 6-12× speedup, enabling interactive image editing on mobile devices. paper / website / video / code

  • Oct 2020:
    SpAtten: Efficient Sparse Attention Compages with Pour Token and Head Pruning

     appeared at HPCA’21 and spotlighted byMIT News.Paper  /Slides  /Intro Video /Project Page
  • Jan 2021:
    IOS: Inter-operator Scheduler For CNN Acceleration

     is accepted by
     MLSys’21. Existing deep learning frameworks focus on optimizing intra-operator parallelization. Nevertheless, a single operator tin can not fully utilize the available parallelism in GPU, specially under small batch size. We extensively written report the parallelism between operators and suggest Inter-Operator Scheduler (IOS) to automatically schedule the execution of multiple operators in parallel. paper / code / video / slides / poster

  • December 2020:
    MCUNet: Tiny Deep Learning on IoT Devices

     is presented at NeurIPS’20 every bit spotlight presentation.newspaper /website /MIT News /Wired /
    Stacey on IoT /Morning time Brew /IBM /Analytics Insight
  • Dec 2020:Tiny Transfer Learning: Reduce


    , non Trainable Parameters for


     On-Device Learning
    is presented at NeurIPS’ /slides /code
  • Dec 2020:
    Differentiable Augmentation for Information-Efficient GAN Training
     is presented at NeurIPS’20. code / website / talk / VentureBeat / blog
  • Aug 2020:
     team received the beginning place in the Low-Power Estimator Vision Claiming, mobile CPU detection track.
  • Aug 2020:
     team received the first identify in the Low-Power Computer Vision Claiming, FPGA rail.
  • July 2020:
    ranks first on SemanticKITTI.
  • July 2020:
    Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
     is accustomed by ECCV’20.
  • June 2020:
    Once-For-All Network (OFA)
     for on-device AI is highlighted by Qualcomm
  • June 2020: We open up sourced
    Data-Efficient GAN Training with DiffAugment

    onGithub. Covered past VentureBeat.

  • May 2020:
    HAT: Hardware-Aware Transformer for Efficient Natural language Processing

     to appear at ACL’2020 newspaper / code / website. This is our second paper on efficient NLP on edge devices, together with

    Lite Transformer

     ICLR’twenty paper / code / website / slides

  • April 2020: Slides for ICLR’20 NAS workshop and TinyML webinar “AutoML for TinyML with In one case-for-All Network” isavailable
  • April 2020:
    One time-For-All Network (OFA)

     is covered pastMIT News andVenture Beat:
    Reducing the carbon footprint of bogus intelligence: MIT system cuts the energy required for grooming and running neural networks.

  • Mar 2020:
    Point-Voxel CNN for Efficient 3D Deep Learning

     is highlighted pastNVIDIA Jetson Community Project Spotlight

  • Mar 2020:
    Point-Voxel CNN for Efficient 3D Deep Learning

     is deployed onMIT Driverless, improving the 3D detection accuracy from 95% to 99.93%, improving the detection range from 8m to 12m, reducing the latency from 2ms/object to one.25ms/objectdemo
  • Feb 2020:
    SpArch: Efficient Architecture for Sparse Matrix Multiplication
     appeared at International Symposium on High-Performance Estimator Architecture (HPCA) 2020. Sparse Matrix Multiplication (SpMM) is an important archaic for many applications (graphs, thin neural networks, etc). SpArch has a spatial merger assortment to perform parallel merge of the partial sum, and a Huffman Tree scheduler to make up one’s mind the optimal club to merge the partial sums, reducing the DRAM access.  newspaper / slides / website / 2min talk / full talk
  • February 2020:
    GAN Compression: Learning Efficient Architectures for Conditional GANs


    APQ: Joint Search for Network Compages, Pruning and Quantization Policy

     are accepted pastCVPR’20.

  • Feb 2020: With our efficient model, the Once-for-All Network, our team is awarded the
    get-go place
     in the Low Ability Computer Vision Claiming (both classification and detection rail).
  • January 2020: Song received the NSF CAREER Award for “
    cient Algorithms
     and Hardware for Accelerated Motorcar Learning”.

  • Dec 2022:
    Once-For-All Network (OFA)

    is accepted by ICLR’2020
    . Train but once, specialize for many hardware platforms, from CPU/GPU to hardware accelerators. OFA decouples model training from architecture search.
    OFA consistently outperforms SOTA NAS methods (upward to 4.0% ImageNet top1 accurateness improvement over MobileNet-V3) while reducing orders of magnitude GPU hours and CO2 emission. In detail, OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting (<600M FLOPs).
    Paper / Code / Affiche / MIT News / Qualcomm News / VentureBeat


  • Dec 2022:
    Lite Transformer with Long Brusque Term Attending
    is accepted by ICLR’2020. Nosotros investigate the mobile setting for NLP tasks to facilitate the deployment of NLP model on the border devices. [Newspaper]
  • Nov 2022:
    AutoML for Architecting Efficient and Specialized Neural Networks

    to appear atIEEE Micro
  • Oct 2022:

     is featured byMIT News/Engadget/
    NVIDIA News /
    MIT Engineering science Review
  • Oct 2022: Our team is awarded the
    first identify

     in the Low Power Figurer Vision Challenge, DSP track at ICCV’nineteen using the

    Once-for-all Network


  • Oct 2022: Our winning solution to the Visual Wake Words Challenge is highlighted by Google. The technique isProxylessNAS.demo /code
  • October 2022: Open source: the search code for ProxylessNAS is available on Github.
  • Oct 2022:Training Kinetics in xv Minutes: Large-scale Distributed


     on Videosis accustomed by NeurIPS workshop on Systems for ML.

    TSM, a compact model for video understanding, is hardware-friendly non only for inference only also for grooming. With TSM, nosotros can calibration upward Kinetics grooming to 1536 GPUs and reduce the preparation fourth dimension from two days to 15 minutes.

    TSM is highlighted at the

    opening remarks


    AI Research Week

    hosted by the MIT-IBM Watson AI Lab

    . paper

  • October 2022:Distributed Preparation beyond the World
    is ac

    cepted by NeurIPS workshop on Systems for ML.

  • Oct 2022:Neural-Hardware Architecture Search

     accustomed past NeurIPS workshop on ML for Systems.

  • Sep 2022:
    Point-Voxel CNN for Efficient 3D Deep Learning

    is accepted by NeurIPS’19 as spotlight presentation.paper /demo /playlist /talk /slides /code /website
  • Sep 2022:
    Deep Leakage from Gradients
    is accustomed past NeurIPS’nineteen. newspaper / poster / code / website
  • July 2022:
    TSM: Temporal Shift Module for Efficient Video Understanding

    is accepted past ICCV’nineteen. Video understanding is more computationally intensive than images, making it harder to deploy on edge devices. Frames in the temporal dimension is highly redundant. TSM uses 2nd convolution’due south computation complexity and achieves better temporal modeling ability than 3D convolution. TSM also enables low-latency, real-time video recognition (13ms latency on Jetson Nano and 70ms latency on Raspberry PI-iii). paper / demo / lawmaking / poster / industry integration@NVIDIA / MIT News / Engadget / MIT Technology Review / NVIDIA News / NVIDIA Jetson Developer Forum
  • June 2022: HAN Lab is awarded the commencement place in the Visual Wake-up Word Claiming@CVPR’xix. The task is human being detection on IoT device that has a tight computation budget:  <250KB model size, <250KB peak memory usage, <60M MAC. The techniques are described in the ProxylessNAS paper. lawmaking / Raspberry Pi and Pixel 3 demo
  • June 2022: Song is presenting “Pattern Automation for Efficient Deep Learning by Hardware-enlightened Neural Architecture Search and Compression” at ICML workshop on On-Device Car Learning& Meaty Deep Neural Network Representations, CVPR workshop on Energy Efficient Auto Learning and Cerebral Computing for Embedded Applications, CVPR workshop on Efficient Deep Learning for Computer Vision, UCLA, TI andWorkshop on Approximate Calculating Across the Stack. newspaper / slides
  • June 2022: Open source. AMC: AutoML for Model Compression and Acceleration on Mobile Devices is available on Github. AMC uses reinforcement learning to automatically find the optimal sparsity ratio for channel pruning.
  • June 2022: Open source. HAQ: Hardware-aware Automatic Quantization with Mixed Precision is available on Github.
  • May 2022: Vocal Han received Facebook Inquiry Honour.
  • April 2022: Defensive Quantization on MIT News: Improving Security as Artificial Intelligence Moves to Smartphones.
  • Apr 2022: Our manuscript of Design Automation for Efficient Deep Learning Computing is bachelor on arXiv (accepted by the Micro periodical). slides
  • March 2022: ProxylessNAS is covered by MIT News: Kicking Neural Network Blueprint Automation into High Gear and IEEE Spectrum: Using AI to Make Amend AI.
  • March 2022:
    HAQ: Hardware-aware Automated

    Quantization with Multi-precision

    is accepted by CVPR’19  as oral presentation. HAQ leverages reinforcement learning to automatically decide the quantization policy (chip width per layer), and we take the hardware accelerator’s feedback in the design loop. Rather than relying on proxy signals such every bit FLOPs and model size, nosotros employ a hardware simulator to generate direct feedback (both latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automatic and can specialize the quantization policy for different neural network architectures and hardware architectures. And then far, ProxylessNAS [ICLR’19] => AMC [ECCV’xviii] => HAQ [CVPR’xix] forms a pipeline of  efficient AutoML.
  • Feb 2022: Song presented “Bandwidth-Efficient Deep Learning with Algorithm and Hardware Co-Design” at ISSCC’nineteen in the forum “Intelligence at the Edge: How Can Nosotros Make Machine Learning More Energy Efficient?
  • Jan 2022: Song is appointed to the Robert J. Shillman (1974) Career Development Chair.
  • January 2022: “Song Han: Democratizing bogus intelligence with deep compression” past MIT Industry Liaison Programme. article / video
  • Dec 2022: Congrats Xiangning received the 2nd place in the feedback phase of the NeuraIPS’18 AutoML Challenge: AutoML for Lifelong Machine Learning.
  • Dec 2022:
    Defensive Quantization: When Efficiency Meets Robustness
    is accustomed by ICLR’19. Neural network quantization is becoming an industry standard to compress and efficiently deploy deep learning models. Is model pinch a free tiffin? No, if not treated advisedly. We discover that the conventional quantization approaches are vulnerable to adversarial attacks. This paper aims to heighten people’southward awareness virtually the security of the quantized models, and nosotros designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models. paper / MIT News
  • December 2022:
    Learning to Pattern Circuits
    appeared at NeurIPS workshop on Machine Learning for Systems (full version accepted by DAC’2020). Analog IC design relies on human experts to search for parameters that satisfy excursion specifications with their experience and intuitions, which is highly labor intensive and time consuming. This paper propose a learning based approach to size the transistors and help engineers to shorten the pattern cycle. newspaper
  • Dec 2022: Our piece of work on
    ProxylessNAS: Directly Neural Compages Search on Target Task and Hardware
    is accepted by ICLR’19. Neural Architecture Search (NAS) is ciphering intensive. ProxylessNAS saves the GPU hours by200xthan NAS, saves GPU retentiveness by10x than DARTS, while directly searching on ImageNet. ProxylessNAS is hardware-aware. It tin designspecialized neural network compages for dissimilar hardware, making inference fast. With>74.5% meridian-1 accuracy, the measured latency of ProxylessNAS isi.8x fasterthan MobileNet-v2, the electric current industry standard for mobile vision. newspaper / lawmaking / demo / poster / MIT news / IEEE Spectrum / industry integration: @AWS, @Facebook
  • Sep 2022: Vocal Han received Amazon Motorcar Learning Research Laurels.
  • Sep 2022: Song Han received SONY Faculty Award.
  • Sep 2022: Our work on
    AMC: AutoML for Model
     on Mobile Devices
     is accepted by ECCV’xviii. This paper proposes learning-based method to perform model pinch, rather than relying on human heuristics and rule-based methods. AMC tin can automate the model compression process, achieve better compression ratio, and likewise be more sample efficient. Information technology takes shorter time can do better than rule-based heuristics. AMC compresses ResNet-50 past5x without losing accuracy. AMC makes MobileNet-v12x faster with 0.4% loss of accuracy. paper / website
Popular:   The Last Of Us Ps5 Remake Could Release This Holiday According To A Recent Report


Ph.D. Stanford University. Counselor: Prof. Bill Dally

B.S. Tsinghua Academy


Email: FirstnameLastname [at] mit [dot] edu

Students who are interested in internship, please email: