tags: - neural-architecture-search - evolutionary-computation - computer-vision - depth-estimation - object-detection - semantic-segmentation - 3d-gaussian-splatting - mamba - vision-transformer - multi-objective-optimization datasets: - imagenet-1k - detection-datasets/coco - scene_parse_150 - kitti - nyu_depth_v2 - RealEstate10K metrics: - mAP - miou - abs_rel - psnr - ssim pipeline_tag: depth-estimation library_name: pytorch

EvoNAS: Dual-Domain Representation Alignment for Geometry-Aware Architecture Search

Overview

EvoNAS is a multi-objective evolutionary neural architecture search framework that discovers Pareto-optimal vision backbones bridging 2D dense prediction and 3D rendering. It features:

Hybrid VSS-ViT Search Space: Combines Vision State Space (Mamba) blocks with Vision Transformers
CA-DDKD: Cross-Architecture Dual-Domain Knowledge Distillation via DCT constraints
DMMPE: Hardware-isolated distributed evaluation engine for unbiased latency measurement
Progressive Supernet Training (PST): Curriculum-based weight-sharing optimization

The discovered EvoNets achieve state-of-the-art accuracy-efficiency trade-offs across object detection, semantic segmentation, monocular depth estimation, and novel view synthesis.

Model Zoo

Searched Architectures (EvoNets)

Object Detection on COCO (Mask R-CNN)

Model	Params	MACs	AP^b	Latency	Throughput	NID	Weight
EvoNet-C1	33M	190G	45.4	50.2ms	26 FPS	1.39	Download
EvoNet-C2	36M	202G	47.1	55.4ms	23 FPS	1.29	Download
EvoNet-C3	42M	228G	48.5	66.9ms	18 FPS	1.15	Download

Semantic Segmentation on ADE20K (UPerNet)

Model	Params	MACs	mIoU	Latency	Throughput	NID	Weight
EvoNet-A1	23M	711G	44.1	77.3ms	14 FPS	1.93	Download
EvoNet-A2	26M	724G	47.3	81.0ms	13 FPS	1.79	Download
EvoNet-A3	32M	754G	49.7	94.8ms	12 FPS	1.57	Download

Monocular Depth Estimation on KITTI

Model	Params	MACs	Abs Rel↓	δ₁↑	Latency	Throughput	NID	Weight
EvoNet-K1	18.0M	27.3G	0.060	0.960	18.6ms	117 FPS	5.34	Download
EvoNet-K2	22.6M	36.2G	0.056	0.966	24.6ms	83 FPS	4.28	Download
EvoNet-K3	26.3M	45.0G	0.054	0.969	28.0ms	65 FPS	3.68	Download

Monocular Depth Estimation on NYU Depth v2

Model	Params	MACs	Abs Rel↓	δ₁↑	Latency	Throughput	NID	Weight
EvoNet-N1	19.1M	21.7G	0.095	0.912	21.8ms	138 FPS	4.77	Download
EvoNet-N2	24.1M	27.1G	0.089	0.926	25.9ms	107 FPS	3.85	Download
EvoNet-N3	30.3M	33.9G	0.085	0.932	30.8ms	88 FPS	3.08	Download

Novel View Synthesis on RealEstate10K (3DGS)

Model	Params	PSNR↑	SSIM↑	LPIPS↓	Latency	Throughput	Weight
EvoNet-D	44M	26.41	0.871	0.127	88ms	27 FPS	Download

Supernet Checkpoints

Checkpoint	Description	Weight
supernet_imagenet_1k	Stage 1: ImageNet-1K pretrained VSS-ViT supernet	Download
supernet_nyu	Stage 2: Fine-tuned on NYU Depth v2 with CA-DDKD	Download
supernet_kitti	Stage 2: Fine-tuned on KITTI with CA-DDKD	Download
supernet_ade20k	Stage 2: Fine-tuned on ADE20K with CA-DDKD	Download
supernet_coco	Stage 2: Fine-tuned on COCO with CA-DDKD	Download

Teacher Models (Depth Anything)

Checkpoint	Description	Weight
nyu_depth_anything	Depth Anything metric indoor teacher	Download
kitti_depth_anything	Depth Anything metric outdoor teacher	Download
ade20k_vitl	ViT-L teacher for ADE20K segmentation	Download
coco_dinov2	DINOv2 teacher for COCO detection	Download

Quick Start

# Download a specific model
from huggingface_hub import hf_hub_download

# Example: Download EvoNet-N3 (NYU Depth v2)
ckpt_path = hf_hub_download(
    repo_id="YOUR_USERNAME/EvoNAS",
    filename="EvoNAS/evonet_n3_best_abs_rel_0.08475",
)

# Example: Download the ImageNet-1K pretrained supernet
supernet_path = hf_hub_download(
    repo_id="YOUR_USERNAME/EvoNAS",
    filename="supernet_imagenet_1k.pth",
)

# Download all checkpoints
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="YOUR_USERNAME/EvoNAS",
    local_dir="./evonas_checkpoints",
)

Usage

Please refer to our GitHub repository for full training, search, and evaluation instructions.

Inference Example (Monocular Depth Estimation)

import torch
from networks.EvoMambaDepthNet import EvoMambaDepthNet

# Define the searched architecture genotype
evonet_n3_genotype = {
    # Replace with actual searched genotype from search logs
    "d_state": [...],
    "ssm_expand": [...],
    "mlp_ratio": [...],
    "depth": [...],
}

model = EvoMambaDepthNet(genotype=evonet_n3_genotype)
checkpoint = torch.load("evonet_n3_best_abs_rel_0.08475", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()

# Run inference
with torch.no_grad():
    depth = model(image_tensor)

File Structure

.
├── EvoNAS/                          # Searched EvoNet checkpoints
│   ├── evonet_c{1,2,3}_*            # COCO object detection
│   ├── evonet_a{1,2,3}_*            # ADE20K semantic segmentation
│   ├── evonet_k{1,2,3}_*            # KITTI depth estimation
│   ├── evonet_n{1,2,3}_*            # NYU v2 depth estimation
│   └── logs/                        # Training logs
├── NVS/                             # Novel view synthesis checkpoint
│   └── epoch_9-step_150000.ckpt
├── SuperNet_FT/                     # Fine-tuned supernet checkpoints
│   ├── supernet_ade20k.pth
│   ├── supernet_coco.pth
│   ├── supernet_kitti
│   └── supernet_nyu
├── pre_DA/                          # Teacher model checkpoints
│   ├── ade20k_vitl_mIoU_59.4.pth
│   ├── coco_dinov2_epoch_12.pth
│   ├── kitti_depth_anything_metric_depth_outdoor.pt
│   └── nyu_depth_anything_metric_depth_indoor.pt
└── supernet_imagenet_1k.pth         # ImageNet-1K pretrained supernet

Citation

@article{zhang2025evonas,
  title={Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search},
  author={Zhang, Haoyu and Yu, Zhihao and Wang, Rui and Jin, Yaochu and Liu, Qiqi and Cheng, Ran},
  journal={arXiv preprint arXiv:2603.19563},
  year={2025}
}

Acknowledgements

We thank the open-source community behind PyTorch, Mamba SSM, Spatial-Mamba, MMDetection, MMSegmentation, Depth Anything, pymoo, and timm.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for kujimili/EvoNAS

Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search

Paper • 2603.19563 • Published 8 days ago