Vocal Han is an Associate Professor (starting July ane, 2022) at MIT’s EECS. He received his PhD caste from Stanford University. His research focuses on efficient deep learning computing. He proposed “deep pinch” technique that tin can reduce neural network size by an guild of magnitude without losing accurateness, and the hardware implementation “efficient inference engine” that first exploited pruning and weight sparsity in deep learning accelerators. His team’south work on hardware-aware neural architecture search (ProxylessNAS, One time-for-All Network (OFA), MCUNet) was integrated in Facebook, Amazon, Microsoft, Intel, SONY, received the outset identify in six low-ability computer vision contest awards in flagship AI conferences. Song received Best Newspaper awards at ICLR’xvi and FPGA’17, multiple faculty awards from Amazon, SONY, Facebook, NVIDIA and Samsung. Song was named “35 Innovators Under 35” past MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more than efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning” and the IEEE “AIs 10 to Sentinel: The Futurity of AI” award.
Group Website, Google Scholar, YouTube, Twitter, Github, LinkedIn
TinyML and intelligent internet of things (IIoT), e
fficient preparation and inference, model compression and acceleration:[MLSys’22][ICLR’22][NeurIPS’21][NeurIPS’21][MLSys’21][NeurIPS’twenty, spotlight][NeurIPS’xx][ICLR’twenty][CVPR’twenty][CVPR’twenty][ICLR’19][CVPR’nineteen, oral][ECCV’eighteen][ICLR’xvi, BP][NIPS’fifteen]
- Efficient AI applications on edge devices (video / betoken deject / NLP / GAN): [CVPR’22][ICRA’21][CVPR’21][NeurIPS’20][ACL’xx][CVPR’20][ECCV’20][ICLR’20][NeurIPS’19, spotlight][ICCV’19]
Hardware accelerator for neural networks:[MICRO’21][HPCA’21][HPCA’20][FPGA’17, BP][ISCA’16]
- Auto learning for hardware design, algorithm-hardware co-design: [DAC’21][DAC’20][NeurIPS’19 W]
- Quantum AI System: [HPCA’22][DAC’22][DAC’22][TorchQuantum Library]
Our inquiry have influenced and landed in many industrial products: Intel OpenVino, Intel Neural Network Distiller, Apple tree Neural Engine, NVIDIA Sparse Tensor Core, AMD-Xilinx Vitis AI, Qualcomm AI Model Efficiency Toolkit (AIMET), Amazon AutoGluon, Facebook PyTorch,Microsoft NNI,
SONY Neural Architecture Search Library,
SONY Model Compression Toolkit,
ADI MAX78000/MAX78002 Model Training and Synthesis Tool.
- Samsung Global Research Outreach (GRO) Honor, 2021
- IEEE “AIs ten to Picket: The Time to come of AI” Award, 2020
- NSF CAREER Award, 2020
- NVIDIA Bookish Partnership Award, 2020, 2021
- MIT Technology Review list of 35 Innovators Under 35, 2022
- SONY Kinesthesia Award, 2017/2022/2020
- Amazon Machine Learning Research Award, 2022/2022
- Facebook Research Award, 2022
- Best paper award, FPGA’2017
- All-time newspaper accolade, ICLR’2016
Starting time place, 6th
AI Driving Olympics,
NuScenes Partition Claiming @ICRA 2021 [SPVNAS]
First identify, 5th
Low-Power Computer Vision Challenge,
CPU detection rails & FPGA track, Aug 2020 [OFA]
First place, 3D semantic division on
Showtime place, 4th
Low-Ability Calculator Vision Claiming,
both CPU classification and detection track, Jan 2020
First identify, third
Depression-Power Computer Vision Challenge,
DSP track, @ICCV 2022
NLP runway (WikiText-103), @NeurIPS 2022
Visual Wake Words Challenge,
TF-light track, @CVPR 2022
– MIT News, Tiny motorcar learning design alleviates a bottleneck in retentiveness usage on internet-of-things devices
– Wired, AI Algorithms Are Slimming Downwards to Fit in Your Refrigerator
– MIT News
, Arrangement brings deep learning to “net of things” devices
– Morning Brew,
Researchers Figured Out How to Fit More than AI Than Ever onto Net of Things Microchips
– IBM,New IBM-MIT organisation brings AI to microcontrollers – paving the way to ‘smarter’ IoT
– Analytics Insight,
Amalgamating Ml And Iot In Smart Dwelling Devices
– MIT Homepage Spotlight,
A linguistic communication learning system that pays attention — more than efficiently than ever before
– ALL About Circuits, Is Hardware the Key to Advancing Natural language Processing?
– Embedded Calculating Blueprint, MIT’s SpAtten Architecture Uses Attention Mechanism for Advanced NLP
– MIT News, Making breakthrough circuits more robust
– Venture Beat, MIT researchers merits augmentation technique can train GANs with less information
In one case-For-All Network
– Venture Vanquish, MIT aims for free energy efficiency in AI model training
– MIT News, Reducing the carbon footprint of artificial intelligence
Enquiry from MIT shows promising results for on-device AI
How MIT is making ground towards ‘greener’ AI
– AI Daily, New MIT Architecture May Pb To Smaller Carbon Footprints For Neural Networks
– Inhabitat, MIT moves toward greener, more sustainable artificial intelligence
Hardware-Aware Transformer [ACL’20]:
– MIT News, Shrinking deep learning’due south carbon footprint
– Venture Beat, New AI technique speeds up language models on edge devices
Temporal Shift Module
– NVIDIA, New MIT Video Recognition Model Dramatically Improves Latency on Border Devices
– MIT Technology Review, Powerful reckoner vision algorithms are now modest enough to run on your phone
– Engadget, MIT-IBM developed a faster way to railroad train video recognition AI
– MIT News, Faster video recognition for the smartphone era
– IEEE Spectrum, Using AI to Make Meliorate AI
– MIT News, Kicking neural network pattern automation into high gear
We thank the generous sponsors of our research: ADI, Amazon, AMD, Apple, ARM, Cognex, Facebook, Ford, Google, Hyundai, IBM, Intel, Microsoft, MIT AI Hardware Program, MIT Microsystems Technology Lab, MIT-IBM Watson AI Lab, National Science Foundation, NVIDIA, Qualcomm, Samsung, Semiconductor Inquiry Corporation, SONY, TI.
Network Augmentation for Tiny Deep Learning
is presented at ICLR’22. newspaper / code
March 2022: A journal paper that summarizes our philosophies for mobile deep learning:
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
We showtime present popular model pinch methods, including pruning, factorization, quantization, as well equally low-cal-weight primitives. To reduce the manual design price, we present the hardware-aware AutoML framework, including neural architecture search (ProxylessNAS, One time-for-all) and automated pruning (AMC) and quantization (HAQ). Nosotros and then encompass efficient on-device training to enable user customization based on the local information on mobile devices (TinyTL). Apart from general acceleration techniques, we too showcase several chore-specific accelerations for point deject, video, and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.
- March 2022:Low-cal Pose: Efficient Compages for Human Pose Estimation is accepted by CVPR’22. Light Pose accelerates human pose estimation past up to 5x on Snapdragon 855, Jetson Nano and Raspberry Pi.
QuantumNAS: Dissonance-Adaptive Search for Robust Breakthrough Circuits
is presented at HPCA 2022.paper /qmlsys website /lawmaking
QuantumNAT: Quantum Dissonance-Aware Training with Racket Injection, Quantization and Normalization
is accustomed by DAC’22. Due to the large quantum noises (errors), the functioning of quantum AI models has a severe degradation on real breakthrough devices. We nowadays QuantumNAT, a QNN-specific framework to perform dissonance-enlightened optimizations in both training and inference stages to improve robustness. We advise postal service-measurement normalization to mitigate the characteristic distribution differences between dissonance-gratis and noisy scenarios. Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to QNN according to realistic noise models of quantum hardware. Finally, mail service-measurement quantization is introduced to quantize the measurement outcomes to discrete values, achieving the denoising effect of quantum noise.paper /website /lawmaking
February 2022:QOC: Breakthrough On-Bit Preparation with Parameter Shift and Gradient Pruning
is accustomed by DAC’22. In order to attain
training of quantum AI models, the grooming process needs to be offloaded to real quantum machines instead of using exponential-cost classical simulators. One common approach to obtaining QNN gradients is parameter shift whose cost scales linearly with the number of qubits. Nosotros present QOC, the
beginning experimental sit-in
of practical on-flake parameterized quantum circuit training with parameter shift. Nevertheless, we find that due to the significant quantum errors (noises) on real machines, gradients obtained from naive parameter shift have low allegiance and thus dethrone the training accuracy. To this cease, nosotros further propose probabilistic slope pruning to firstly place gradients with potentially large errors and then remove them. Specifically, minor gradients have larger relative errors than large ones, thus having a college probability to be pruned.paper /website/code
Network Augmentation for Tiny Deep Learning
is accepted by ICLR’22. Grooming tiny models are different from large models: rather than augmenting the data, we should augment the model, since tiny models tend to suffer from limited capacity. To alleviate this issue, NetAug augments the network (opposite dropout) instead of inserting noise into the dataset or the network. It puts the tiny model into larger models and encourages it to work as a sub-model of larger models to get extra supervision, in add-on to operation as an independent model. paper
TorchSparse: Efficient Signal Cloud Inference Engine
is accepted past MLSys’22. The sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently; existing sparse dispatch techniques for 2D images exercise not translate to 3D point clouds. Nosotros introduce TorchSparse, a high-performance point cloud inference engine that accelerates the thin convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution:information movement andirregular ciphering. Information technology optimizes the data orchestration by quantization and fused locality-aware memory access, reducing the memory movement cost past two.7x. It too adopts adaptive MM group to merchandise ciphering for improve regularity, achieving 1.four-1.5x speedup for matrix multiplication. newspaper / code
MCUNet-v2: Retention Efficient Inference for Tiny Deep Learning
is presented at NeurIPS’21. Tiny deep learning on microcontroller units (MCUs) is challenging due to the express retentiveness size. The imbalanced retentivity distribution CNN exacerbates the issue: the first several blocks take an order of magnitude larger memory usage than the rest of the network. We propose a patch-past-patch inference scheduling, which operates but on a small-scale spatial region of the feature map and significantly cuts downward the acme memory. However, naive implementation brings overlapping patches and computation overhead. We farther propose network redistribution to shift the receptive field and FLOPs to the after stage and reduce the computation overhead. MCUNetV2 sets a record ImageNet accuracy on MCU (71.viii%), and achieves >90% accurateness on the visual wake words dataset under just 32kB SRAM. MCUNetV2 too unblocks object detection on tiny devices, achieving 16.ix% higher mAP on Pascal VOC compared to the country-of-the-art result. Our study largely addressed the memory bottleneck in tinyML and paved the mode for diverse vision applications beyond nomenclature just also detection. newspaper / website / slides / demo / demo2 / MIT News / TechTalks
Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning
is presented at NeurIPS’21. Federated learning suffers from high communication latency. We advise Delayed Slope Averaging (DGA), which delays the averaging step to allow local ciphering run ahead of communication. We theoretically prove that DGA attains a similar convergence rate as FedAvg, and empirically evidence that our algorithm can tolerate high network latency without compromising accurateness. DGA is implemented on 16-node Raspberry Pi cluster. With both IID and non-IID partitions, and testify DGA can bring 2.55×
×speedup. newspaper /website /slides /poster
NAAS: Neural Accelerator Architecture Search
is presented at DAC’21. We proposed a novel data-driven arroyo for AI-designed AI accelerator. Such data-driven method can find knowledge that is hard to be explicitly expressed by human and can efficiently scale. Blueprint spaces of hardware, compiler, and neural networks are tightly entangled, joint-optimization is better than carve up optimization. Given the huge design space, information-driven approach is desirable. With the same runtime, machine learning methods can explore more than data points than man designers. NAAS proposes a data-driven, automatic pattern infinite exploration of neural accelerator architectures that outperforms human design. paper / website / slides / video
QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits
to announced at HPCA’22. Quantum noise is the fundamental challenge in Noisy Intermediate-Calibration Breakthrough (NISQ) computers. We propose QuantumNAS (NAS: Noise-Adaptive Search)
, a comprehensive framework for noise-adaptive co-search of the variational circuit and qubit mapping. QuantumNAS decouples the circuit search and parameter training by introducing a novel SuperCircuit, followed by evolutionary co-search of SubCircuit and its qubit mapping. Finally, we perform iterative gate pruning and finetuning to remove redundant gates and reduce noise. QuantumNAS is the starting time to demonstrate over 95% 2-class, 85% iv-class, and 32% x-form classification accuracy on real QC. Information technology also achieves the everyman eigenvalue for VQE tasks. We open up-source TorchQuantum for fast training of parameterized quantum circuits to facilitate future research.
paper /qmlsys website /code
PointAcc: Efficient Betoken Cloud Accelerator
is presented at International Symposium on Microarchitecture (MICRO’21) .
Deep learning on bespeak clouds plays a vital role in a wide range of applications such as autonomous driving. Compared to projecting the point cloud to 2d space, directly processing 3D point cloud yields college accuracy and lower #MACs. Still, the extremely sparse nature of point cloud poses challenges to hardware acceleration. PointAcc proposes an versatile sorting engine to determine the nonzero input-output pairs, streams the sparse computation with reconfigurable caching, and temporally fuses consecutive dense layers to reduce the memory footprint. Co-designed with calorie-free-weight neural networks, PointAcc rivals the prior art by 100X speedup with ix.1% higher accuracy for semantic sectionalisation. paper / website / slides / talk / lightning talk
LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
is accepted by International Conference on Computer Vision (ICCV’21)newspaper
SemAlign: Note-Gratuitous Camera-LiDAR Calibration with Semantic Alignment Loss
is accustomed past International Conference on Intelligent Robots and Systems (IROS’21)paper
June 2021: Congrats Zhijian and Yujun receiving the Qualcomm Innovation Fellowship for “
Algorithm-Hardware Co-Design for Efficient LiDAR-Based Democratic Driving” project
June 2021: Congrats Hanrui and Han receiving the Qualcomm Innovation Fellowship for “On-Device NLP Inference and Training with Algorithm-Hardware Co-Design” project
Once-for-All (OFA) Network
got a world-record in the open up division ofMLPerf Inference Benchmark: ane.078M inferences per second on 8 A100 GPUs
. paper / website / Github
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
is integrated pastIntel OpenVINO Toolkit. paper
Efficient and Robust LiDAR-Based Stop-to-End Navigation
is accepted by
ICRA’21. We introduce Fast-LiDARNet that is based on sparse GPU kernel optimization and hardware-aware neural architecture search, improving the speed from 5 fps to 47 fps; together with Hybrid Evidential Fusion that directly estimates the uncertainty and fuse the control predictions, which
oad test. paper
Anycost GANs for Interactive Image Synthesis and Editing
is accepted by
CVPR’21. GANs are large. GANs are slow. It takes seconds to edit a single on edge devices, prohibiting interactive user experience. Anycost GANs can be executed at various computational toll budgets (upwards to ten× ciphering reduction) and adapt to a broad range of hardware and latency requirements. When deployed on border devices, our model achieves 6-12× speedup, enabling interactive image editing on mobile devices. paper / website / video / code
SpAtten: Efficient Sparse Attention Compages with Pour Token and Head Pruning
appeared at HPCA’21 and spotlighted byMIT News.Paper /Slides /Intro Video /Project Page
IOS: Inter-operator Scheduler For CNN Acceleration
is accepted by
MLSys’21. Existing deep learning frameworks focus on optimizing intra-operator parallelization. Nevertheless, a single operator tin can not fully utilize the available parallelism in GPU, specially under small batch size. We extensively written report the parallelism between operators and suggest Inter-Operator Scheduler (IOS) to automatically schedule the execution of multiple operators in parallel. paper / code / video / slides / poster
MCUNet: Tiny Deep Learning on IoT Devices
is presented at NeurIPS’20 every bit spotlight presentation.newspaper /website /MIT News /Wired /
Stacey on IoT /Morning time Brew /IBM /Analytics Insight
Dec 2020:Tiny Transfer Learning: Reduce
, non Trainable Parameters for
is presented at NeurIPS’20.website /slides /code
Differentiable Augmentation for Information-Efficient GAN Training
is presented at NeurIPS’20. code / website / talk / VentureBeat / blog
team received the beginning place in the Low-Power Estimator Vision Claiming, mobile CPU detection track.
team received the first identify in the Low-Power Computer Vision Claiming, FPGA rail.
ranks first on SemanticKITTI.
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
is accustomed by ECCV’20.
Once-For-All Network (OFA)
for on-device AI is highlighted by Qualcomm
June 2020: We open up sourced
Data-Efficient GAN Training with DiffAugment
onGithub. Covered past VentureBeat.
HAT: Hardware-Aware Transformer for Efficient Natural language Processing
to appear at ACL’2020 newspaper / code / website. This is our second paper on efficient NLP on edge devices, together with
ICLR’twenty paper / code / website / slides
- April 2020: Slides for ICLR’20 NAS workshop and TinyML webinar “AutoML for TinyML with In one case-for-All Network” isavailable
One time-For-All Network (OFA)
is covered pastMIT News andVenture Beat:
Reducing the carbon footprint of bogus intelligence: MIT system cuts the energy required for grooming and running neural networks.
Point-Voxel CNN for Efficient 3D Deep Learning
is highlighted pastNVIDIA Jetson Community Project Spotlight
Point-Voxel CNN for Efficient 3D Deep Learning
is deployed onMIT Driverless, improving the 3D detection accuracy from 95% to 99.93%, improving the detection range from 8m to 12m, reducing the latency from 2ms/object to one.25ms/objectdemo
SpArch: Efficient Architecture for Sparse Matrix Multiplication
appeared at International Symposium on High-Performance Estimator Architecture (HPCA) 2020. Sparse Matrix Multiplication (SpMM) is an important archaic for many applications (graphs, thin neural networks, etc). SpArch has a spatial merger assortment to perform parallel merge of the partial sum, and a Huffman Tree scheduler to make up one’s mind the optimal club to merge the partial sums, reducing the DRAM access. newspaper / slides / website / 2min talk / full talk
GAN Compression: Learning Efficient Architectures for Conditional GANs
APQ: Joint Search for Network Compages, Pruning and Quantization Policy
are accepted pastCVPR’20.
Feb 2020: With our efficient model, the Once-for-All Network, our team is awarded the
in the Low Ability Computer Vision Claiming (both classification and detection rail).
January 2020: Song received the NSF CAREER Award for “
and Hardware for Accelerated Motorcar Learning”.
Once-For-All Network (OFA)
is accepted by ICLR’2020
. Train but once, specialize for many hardware platforms, from CPU/GPU to hardware accelerators. OFA decouples model training from architecture search.
OFA consistently outperforms SOTA NAS methods (upward to 4.0% ImageNet top1 accurateness improvement over MobileNet-V3) while reducing orders of magnitude GPU hours and CO2 emission. In detail, OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting (<600M FLOPs).
Paper / Code / Affiche / MIT News / Qualcomm News / VentureBeat
Lite Transformer with Long Brusque Term Attending
is accepted by ICLR’2020. Nosotros investigate the mobile setting for NLP tasks to facilitate the deployment of NLP model on the border devices. [Newspaper]
AutoML for Architecting Efficient and Specialized Neural Networks
to appear atIEEE Micro
is featured byMIT News/Engadget/
NVIDIA News /
MIT Engineering science Review
Oct 2022: Our team is awarded the
in the Low Power Figurer Vision Challenge, DSP track at ICCV’nineteen using the
- Oct 2022: Our winning solution to the Visual Wake Words Challenge is highlighted by Google. The technique isProxylessNAS.demo /code
- October 2022: Open source: the search code for ProxylessNAS is available on Github.
Oct 2022:Training Kinetics in xv Minutes: Large-scale Distributed
on Videosis accustomed by NeurIPS workshop on Systems for ML.
TSM, a compact model for video understanding, is hardware-friendly non only for inference only also for grooming. With TSM, nosotros can calibration upward Kinetics grooming to 1536 GPUs and reduce the preparation fourth dimension from two days to 15 minutes.
TSM is highlighted at the
AI Research Week
hosted by the MIT-IBM Watson AI Lab
October 2022:Distributed Preparation beyond the World
cepted by NeurIPS workshop on Systems for ML.
Oct 2022:Neural-Hardware Architecture Search
accustomed past NeurIPS workshop on ML for Systems.
Point-Voxel CNN for Efficient 3D Deep Learning
is accepted by NeurIPS’19 as spotlight presentation.paper /demo /playlist /talk /slides /code /website
Deep Leakage from Gradients
is accustomed past NeurIPS’nineteen. newspaper / poster / code / website
TSM: Temporal Shift Module for Efficient Video Understanding
is accepted past ICCV’nineteen. Video understanding is more computationally intensive than images, making it harder to deploy on edge devices. Frames in the temporal dimension is highly redundant. TSM uses 2nd convolution’due south computation complexity and achieves better temporal modeling ability than 3D convolution. TSM also enables low-latency, real-time video recognition (13ms latency on Jetson Nano and 70ms latency on Raspberry PI-iii). paper / demo / lawmaking / poster / industry integration@NVIDIA / MIT News / Engadget / MIT Technology Review / NVIDIA News / NVIDIA Jetson Developer Forum
- June 2022: HAN Lab is awarded the commencement place in the Visual Wake-up Word Claiming@CVPR’xix. The task is human being detection on IoT device that has a tight computation budget: <250KB model size, <250KB peak memory usage, <60M MAC. The techniques are described in the ProxylessNAS paper. lawmaking / Raspberry Pi and Pixel 3 demo
- June 2022: Song is presenting “Pattern Automation for Efficient Deep Learning by Hardware-enlightened Neural Architecture Search and Compression” at ICML workshop on On-Device Car Learning& Meaty Deep Neural Network Representations, CVPR workshop on Energy Efficient Auto Learning and Cerebral Computing for Embedded Applications, CVPR workshop on Efficient Deep Learning for Computer Vision, UCLA, TI andWorkshop on Approximate Calculating Across the Stack. newspaper / slides
June 2022: Open source. AMC: AutoML for Model Compression and Acceleration on Mobile Devices is available on Github. AMC uses reinforcement learning to automatically find the optimal sparsity ratio for channel pruning.
- June 2022: Open source. HAQ: Hardware-aware Automatic Quantization with Mixed Precision is available on Github.
- May 2022: Vocal Han received Facebook Inquiry Honour.
- April 2022: Defensive Quantization on MIT News: Improving Security as Artificial Intelligence Moves to Smartphones.
- Apr 2022: Our manuscript of Design Automation for Efficient Deep Learning Computing is bachelor on arXiv (accepted by the Micro periodical). slides
- March 2022: ProxylessNAS is covered by MIT News: Kicking Neural Network Blueprint Automation into High Gear and IEEE Spectrum: Using AI to Make Amend AI.
HAQ: Hardware-aware Automated
Quantization with Multi-precision
is accepted by CVPR’19 as oral presentation. HAQ leverages reinforcement learning to automatically decide the quantization policy (chip width per layer), and we take the hardware accelerator’s feedback in the design loop. Rather than relying on proxy signals such every bit FLOPs and model size, nosotros employ a hardware simulator to generate direct feedback (both latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automatic and can specialize the quantization policy for different neural network architectures and hardware architectures. And then far, ProxylessNAS [ICLR’19] => AMC [ECCV’xviii] => HAQ [CVPR’xix] forms a pipeline of efficient AutoML.
- Feb 2022: Song presented “Bandwidth-Efficient Deep Learning with Algorithm and Hardware Co-Design” at ISSCC’nineteen in the forum “Intelligence at the Edge: How Can Nosotros Make Machine Learning More Energy Efficient?
- Jan 2022: Song is appointed to the Robert J. Shillman (1974) Career Development Chair.
- January 2022: “Song Han: Democratizing bogus intelligence with deep compression” past MIT Industry Liaison Programme. article / video
- Dec 2022: Congrats Xiangning received the 2nd place in the feedback phase of the NeuraIPS’18 AutoML Challenge: AutoML for Lifelong Machine Learning.
Defensive Quantization: When Efficiency Meets Robustness
is accustomed by ICLR’19. Neural network quantization is becoming an industry standard to compress and efficiently deploy deep learning models. Is model pinch a free tiffin? No, if not treated advisedly. We discover that the conventional quantization approaches are vulnerable to adversarial attacks. This paper aims to heighten people’southward awareness virtually the security of the quantized models, and nosotros designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models. paper / MIT News
Learning to Pattern Circuits
appeared at NeurIPS workshop on Machine Learning for Systems (full version accepted by DAC’2020). Analog IC design relies on human experts to search for parameters that satisfy excursion specifications with their experience and intuitions, which is highly labor intensive and time consuming. This paper propose a learning based approach to size the transistors and help engineers to shorten the pattern cycle. newspaper
Dec 2022: Our piece of work on
ProxylessNAS: Directly Neural Compages Search on Target Task and Hardware
is accepted by ICLR’19. Neural Architecture Search (NAS) is ciphering intensive. ProxylessNAS saves the GPU hours by200xthan NAS, saves GPU retentiveness by10x than DARTS, while directly searching on ImageNet. ProxylessNAS is hardware-aware. It tin designspecialized neural network compages for dissimilar hardware, making inference fast. With>74.5% meridian-1 accuracy, the measured latency of ProxylessNAS isi.8x fasterthan MobileNet-v2, the electric current industry standard for mobile vision. newspaper / lawmaking / demo / poster / MIT news / IEEE Spectrum / industry integration: @AWS, @Facebook
- Sep 2022: Vocal Han received Amazon Motorcar Learning Research Laurels.
- Sep 2022: Song Han received SONY Faculty Award.
Sep 2022: Our work on
AMC: AutoML for Model
on Mobile Devices
is accepted by ECCV’xviii. This paper proposes learning-based method to perform model pinch, rather than relying on human heuristics and rule-based methods. AMC tin can automate the model compression process, achieve better compression ratio, and likewise be more sample efficient. Information technology takes shorter time can do better than rule-based heuristics. AMC compresses ResNet-50 past5x without losing accuracy. AMC makes MobileNet-v12x faster with 0.4% loss of accuracy. paper / website
Ph.D. Stanford University. Counselor: Prof. Bill Dally
B.S. Tsinghua Academy
Email: FirstnameLastname [at] mit [dot] edu
Students who are interested in internship, please email: