tags: - neural-architecture-search - evolutionary-computation - computer-vision - depth-estimation - object-detection - semantic-segmentation - 3d-gaussian-splatting - mamba - vision-transformer - multi-objective-optimization datasets: - imagenet-1k - detection-datasets/coco - scene_parse_150 - kitti - nyu_depth_v2 - RealEstate10K metrics: - mAP - miou - abs_rel - psnr - ssim pipeline_tag: depth-estimation library_name: pytorch

EvoNAS: Dual-Domain Representation Alignment for Geometry-Aware Architecture Search

arXiv GitHub

Overview

EvoNAS is a multi-objective evolutionary neural architecture search framework that discovers Pareto-optimal vision backbones bridging 2D dense prediction and 3D rendering. It features:

  • Hybrid VSS-ViT Search Space: Combines Vision State Space (Mamba) blocks with Vision Transformers
  • CA-DDKD: Cross-Architecture Dual-Domain Knowledge Distillation via DCT constraints
  • DMMPE: Hardware-isolated distributed evaluation engine for unbiased latency measurement
  • Progressive Supernet Training (PST): Curriculum-based weight-sharing optimization

The discovered EvoNets achieve state-of-the-art accuracy-efficiency trade-offs across object detection, semantic segmentation, monocular depth estimation, and novel view synthesis.

Model Zoo

Searched Architectures (EvoNets)

Object Detection on COCO (Mask R-CNN)

Model Params MACs AP^b Latency Throughput NID Weight
EvoNet-C1 33M 190G 45.4 50.2ms 26 FPS 1.39 Download
EvoNet-C2 36M 202G 47.1 55.4ms 23 FPS 1.29 Download
EvoNet-C3 42M 228G 48.5 66.9ms 18 FPS 1.15 Download

Semantic Segmentation on ADE20K (UPerNet)

Model Params MACs mIoU Latency Throughput NID Weight
EvoNet-A1 23M 711G 44.1 77.3ms 14 FPS 1.93 Download
EvoNet-A2 26M 724G 47.3 81.0ms 13 FPS 1.79 Download
EvoNet-A3 32M 754G 49.7 94.8ms 12 FPS 1.57 Download

Monocular Depth Estimation on KITTI

Model Params MACs Abs Rel↓ δ₁↑ Latency Throughput NID Weight
EvoNet-K1 18.0M 27.3G 0.060 0.960 18.6ms 117 FPS 5.34 Download
EvoNet-K2 22.6M 36.2G 0.056 0.966 24.6ms 83 FPS 4.28 Download
EvoNet-K3 26.3M 45.0G 0.054 0.969 28.0ms 65 FPS 3.68 Download

Monocular Depth Estimation on NYU Depth v2

Model Params MACs Abs Rel↓ δ₁↑ Latency Throughput NID Weight
EvoNet-N1 19.1M 21.7G 0.095 0.912 21.8ms 138 FPS 4.77 Download
EvoNet-N2 24.1M 27.1G 0.089 0.926 25.9ms 107 FPS 3.85 Download
EvoNet-N3 30.3M 33.9G 0.085 0.932 30.8ms 88 FPS 3.08 Download

Novel View Synthesis on RealEstate10K (3DGS)

Model Params PSNR↑ SSIM↑ LPIPS↓ Latency Throughput Weight
EvoNet-D 44M 26.41 0.871 0.127 88ms 27 FPS Download

Supernet Checkpoints

Checkpoint Description Weight
supernet_imagenet_1k Stage 1: ImageNet-1K pretrained VSS-ViT supernet Download
supernet_nyu Stage 2: Fine-tuned on NYU Depth v2 with CA-DDKD Download
supernet_kitti Stage 2: Fine-tuned on KITTI with CA-DDKD Download
supernet_ade20k Stage 2: Fine-tuned on ADE20K with CA-DDKD Download
supernet_coco Stage 2: Fine-tuned on COCO with CA-DDKD Download

Teacher Models (Depth Anything)

Checkpoint Description Weight
nyu_depth_anything Depth Anything metric indoor teacher Download
kitti_depth_anything Depth Anything metric outdoor teacher Download
ade20k_vitl ViT-L teacher for ADE20K segmentation Download
coco_dinov2 DINOv2 teacher for COCO detection Download

Quick Start

# Download a specific model
from huggingface_hub import hf_hub_download

# Example: Download EvoNet-N3 (NYU Depth v2)
ckpt_path = hf_hub_download(
    repo_id="YOUR_USERNAME/EvoNAS",
    filename="EvoNAS/evonet_n3_best_abs_rel_0.08475",
)

# Example: Download the ImageNet-1K pretrained supernet
supernet_path = hf_hub_download(
    repo_id="YOUR_USERNAME/EvoNAS",
    filename="supernet_imagenet_1k.pth",
)
# Download all checkpoints
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="YOUR_USERNAME/EvoNAS",
    local_dir="./evonas_checkpoints",
)

Usage

Please refer to our GitHub repository for full training, search, and evaluation instructions.

Inference Example (Monocular Depth Estimation)

import torch
from networks.EvoMambaDepthNet import EvoMambaDepthNet

# Define the searched architecture genotype
evonet_n3_genotype = {
    # Replace with actual searched genotype from search logs
    "d_state": [...],
    "ssm_expand": [...],
    "mlp_ratio": [...],
    "depth": [...],
}

model = EvoMambaDepthNet(genotype=evonet_n3_genotype)
checkpoint = torch.load("evonet_n3_best_abs_rel_0.08475", map_location="cpu")
model.load_state_dict(checkpoint["model"])
model.eval()

# Run inference
with torch.no_grad():
    depth = model(image_tensor)

File Structure

.
β”œβ”€β”€ EvoNAS/                          # Searched EvoNet checkpoints
β”‚   β”œβ”€β”€ evonet_c{1,2,3}_*            # COCO object detection
β”‚   β”œβ”€β”€ evonet_a{1,2,3}_*            # ADE20K semantic segmentation
β”‚   β”œβ”€β”€ evonet_k{1,2,3}_*            # KITTI depth estimation
β”‚   β”œβ”€β”€ evonet_n{1,2,3}_*            # NYU v2 depth estimation
β”‚   └── logs/                        # Training logs
β”œβ”€β”€ NVS/                             # Novel view synthesis checkpoint
β”‚   └── epoch_9-step_150000.ckpt
β”œβ”€β”€ SuperNet_FT/                     # Fine-tuned supernet checkpoints
β”‚   β”œβ”€β”€ supernet_ade20k.pth
β”‚   β”œβ”€β”€ supernet_coco.pth
β”‚   β”œβ”€β”€ supernet_kitti
β”‚   └── supernet_nyu
β”œβ”€β”€ pre_DA/                          # Teacher model checkpoints
β”‚   β”œβ”€β”€ ade20k_vitl_mIoU_59.4.pth
β”‚   β”œβ”€β”€ coco_dinov2_epoch_12.pth
β”‚   β”œβ”€β”€ kitti_depth_anything_metric_depth_outdoor.pt
β”‚   └── nyu_depth_anything_metric_depth_indoor.pt
└── supernet_imagenet_1k.pth         # ImageNet-1K pretrained supernet

Citation

@article{zhang2025evonas,
  title={Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search},
  author={Zhang, Haoyu and Yu, Zhihao and Wang, Rui and Jin, Yaochu and Liu, Qiqi and Cheng, Ran},
  journal={arXiv preprint arXiv:2603.19563},
  year={2025}
}

Acknowledgements

We thank the open-source community behind PyTorch, Mamba SSM, Spatial-Mamba, MMDetection, MMSegmentation, Depth Anything, pymoo, and timm.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for kujimili/EvoNAS