DATASETS DIRECTORY

129 AEC Datasets, Verified

Open research data for architecture, engineering & construction — floor plans, facades, 3D city models, energy data, codes, and procurement.

+ Submit a dataset
🇳🇱

3D BAG — 10M 3D Buildings (Netherlands)

💚 Open
Urban & 3D Models

10 million automatically reconstructed 3D buildings covering the entire Netherlands in LoD1.2, LoD1.3, and LoD2.2.

Size: 10 million buildings
Formats: CityJSON, CityGML, GPKG
By: TU Delft 3D Geoinformation
License: CC BY 4.0
reconstruction regression detection
View Dataset →
🔬

3D Format Shootout — IFC Test Datasets

💚 Open
Tools & Scripts

Test datasets for 3D file formats focusing on IFC, OBJ, FBX, GLTF, XKT, 3DS, MAX and other 3D formats for benchmark comparisons.

Formats: IFC, OBJ, FBX
By: Christopher Diggins (ara3d)
License: MIT
reconstruction classification
View Dataset →
🛕

3DITA — Indian Temple Architecture Point Clouds

💚 Open
Architecture Images

325+ million points from 47 Nagara-style temple structures. India's first benchmark dataset for temple architecture semantic segmentation.

Size: 325M+ points, 47 temples
Formats: Point Cloud
License: Unknown
point-cloud segmentation classification
View Dataset →

ADA Accessibility Standards

💚 Open
Codes & Regulations

Full text of 2010 ADA Standards for Accessible Design and ADA/ABA Accessibility Guidelines from the U.S. Department of Justice.

Formats: HTML, PDF
By: U.S. Department of Justice
License: Government Open
nlp qa
View Dataset →
🧠

AECBench — LLM Evaluation for AEC

💚 Open
Codes & Regulations

4,800 questions across 23 task types from ECADI + Tongji University covering building codes, design calculations, and construction management.

Size: 4,800 questions
Formats: HuggingFace JSON
By: ECADI + Tongji University
License: Unknown
qa nlp
View Dataset →
📊

AECV-Bench

💚 Open
Floor Plans & BIM

Benchmark for evaluating AI ability to process architectural floor plans. Text extraction ~95%, spatial reasoning 55-75%, symbol counting 39-51%.

Formats: PNG, PDF, JSON
License: MIT
qa classification
View Dataset →
✍️

AIA Contract Documents

🔒 Paid
Procurement & Permits

300+ industry-standard A-Series, B-Series, C/D/E/G-Series contract forms including A101, A201, B101. Most widely used US construction contracts.

Size: 300+ forms
Formats: Word, PDF
By: American Institute of Architects
License: Custom
nlp
View Dataset →
📦

ARCAT — 33,000+ Free Product Specs

💚 Open
Codes & Regulations

1,100+ manufacturers, 33,000+ building products organized by CSI MasterFormat with downloadable specs, AutoCAD, and Revit BIM families.

Size: 33,000+ products
Formats: Word, PDF, CAD
By: ARCAT Inc.
License: Custom
nlp classification
View Dataset →
🏛️

ArCH — Heritage Point Cloud Segmentation

💚 Open
Architecture Images

17 annotated heritage scenes (millions of labeled 3D points) plus 10 unlabeled scenes. 10 semantic classes, UNESCO World Heritage sites included.

Size: 17+ annotated scenes
Formats: XYZ, RGB, Labels
By: Politecnico di Torino
License: Unknown
point-cloud segmentation classification
View Dataset →
🏢

ARCH2S — Exterior Architectural Structures

💚 Open
Tools & Scripts

Semantically-enriched photo-realistic 3D architectural models for semantic segmentation of building exteriors. 4 building types from Hong Kong.

Formats: Point Cloud, Semantic labels
License: Unknown
point-cloud segmentation classification
View Dataset →
🗂️

ArchCAD-400K

💚 Open
Floor Plans & BIM

413,062 chunks from 5,538 CAD drawings across 27 categories. Largest architectural CAD dataset for panoptic symbol spotting. NeurIPS 2025.

Size: 413,062 chunks
Formats: SVG, Vector
By: ArchiAI Lab
License: CC BY 4.0
detection segmentation classification
View Dataset →
🤖

Archilyse — Architecture AI Training Data

💚 Open
Tools & Scripts

AI training dataset for real estate/architecture with open-source AGPL-licensed floor plan to IFC annotation pipeline. Presented at ICCV Paris 2024.

Formats: IFC, Images, JSON
By: Dr. Matthias Standfest
License: GPL-3.0
reconstruction segmentation generation
View Dataset →
💰

Architect Salary Survey — 13K+ US Salaries

💚 Open
Procurement & Permits

13,000+ architect salary surveys across the United States with data visualization. Covers firm size, location, and career level.

Size: 13,000+ records
Formats: CSV
License: Unknown
regression classification
View Dataset →
🏺

Architectural Heritage Elements Image64

💚 Open
Architecture Images

Architectural heritage elements for classification at 64×64 resolution, organized by element class in folder-per-class structure.

Formats: JPG
License: Unknown
classification
View Dataset →
🏛️

Architectural Styles Dataset — 25 Classes

💚 Open
Architecture Images

10,113 images across 25 architectural styles from multiple time periods and mixed sources. Organized in 25 class folders.

Size: 10,113 images
Formats: JPG, PNG
License: Unknown
classification
View Dataset →
🏷️

ArchShapesNet — 44,000 BIM Element Samples

💚 Open
Tools & Scripts

44,000 BIM element samples (4,000 per class × 11 classes). First large-scale BIM element classification dataset from Seoul National University of Science and Technology.

Size: 44,000 elements
Formats: Multi-view renders, IFC
By: Youngsu Yu et al.
License: Unknown
classification detection
View Dataset →
🗺️

Awesome CityGML — 210M+ Global 3D Buildings

💚 Open
Urban & 3D Models

Curated directory of 210+ million semantic 3D buildings from 21 countries and 65+ cities including Netherlands, Poland, Luxembourg, Germany, and Singapore.

Size: 210M+ buildings, 65+ cities
Formats: CityGML, CityJSON, OBJ
By: Olaf Oloocki
License: Unknown
reconstruction regression
View Dataset →

Awesome Procurement Data

💚 Open
Procurement & Permits

Curated list of US federal procurement data resources, APIs, and utilities including pysam, SamDotNet, and procurement-tools Python wrappers.

Formats: GitHub, Links
By: MakeGov
License: Unknown
nlp regression
View Dataset →
🔄

BatchPlan — Floor Plan Extraction from IFC

💚 Open
Tools & Scripts

Python tool that extracts geometric data (CSV/WKT) from IFC models and generates 2D floor plans (PNG) for large-scale dataset creation.

Formats: Python, CSV, WKT
By: Ali Khatami
License: Unknown
reconstruction segmentation
View Dataset →
🔍

BD3 — Building Defects Detection Dataset

💚 Open
Construction & Safety

3,965 annotated RGB images from 50+ buildings (10–60 years old) covering 6 defect types: algae, major/minor crack, peeling, spalling, stain.

Size: 3,965 images
Formats: JPG, Folder-per-class
License: Unknown
classification detection
View Dataset →

BIM Open Schema + DuckDB

💚 Open
Tools & Scripts

Open standard schema enabling sub-second data export from Revit models to DuckDB. Eliminates traditional BIM data bottlenecks.

Formats: DuckDB, Schema, Revit
License: MIT
classification reconstruction
View Dataset →
🔴

BIMNet — Scan-to-BIM Benchmark

💚 Open
Tools & Scripts

116.5M points from 25 real-world scans, 382 rooms, 8,710 m². First openBIM-based scan-to-BIM dataset with IFC ground-truth models.

Size: 116.5M points, 25 scans
Formats: Point Cloud, IFC
By: Tsinghua University
License: Unknown
point-cloud segmentation reconstruction
View Dataset →
🧱

Brick by Brick 2024 — Building Data Classification

💚 Open
Building Energy

ML competition ($20K AUD) for automating building data classification using Brick schema. Based on NeurIPS 2024 Building Timeseries Dataset.

Formats: CSV, JSON
By: UNSW / CSIRO / RACE for 2030
License: Unknown
classification regression
View Dataset →
📑

BRIDGE — 13,000+ Floor Plans with Descriptions

💚 Open
Floor Plans & BIM

Largest annotated floor plan dataset for Document Analysis. Aggregates ROBIN, SESYD, and web sources with bounding boxes and paragraph descriptions.

Size: 13,000+ floor plans
Formats: Mixed images, JSON
By: S. Goyal et al.
License: Unknown
detection segmentation classification
View Dataset →
🔍

Building Facade Segmentation (Roboflow)

💚 Open
Facades

598 images with 10 classes (balcony, car, facade, fence, shop, street, traffic, vegetation, window) in COCO JSON, Pascal VOC, and YOLOv8 formats.

Size: 598 images
Formats: JPG, JSON, XML
By: Roboflow Universe
License: CC BY 4.0
segmentation detection
View Dataset →

BuildingsBench — 900K Building Energy Profiles

💚 Open
Building Energy

53.6M measurements from 3,053 meters in 1,636 buildings over 2 years, plus 900K synthetic profiles from NREL. NeurIPS 2023.

Size: 53.6M measurements, 900K profiles
Formats: CSV, Parquet
By: NREL / U.S. DOE
License: CC BY 4.0
regression classification
View Dataset →
📏

CAADRIA 2026 — 71,334 Architectural Plans

💚 Open
Floor Plans & BIM

71,334 automated architectural plans in DrawScript vector format generated using shape grammar and visual programming. AI training dataset.

Size: 71,334 plans
Formats: DrawScript, Vector
By: Tzu-Chieh Hong
License: Unknown
generation classification
View Dataset →
🦌

Caribou — OpenStreetMap for Grasshopper

💚 Open
Urban & 3D Models

Framework for processing large-scale OpenStreetMap urban data in Grasshopper/Rhino, enabling parametric urban modeling workflows.

Formats: GeoJSON, OSM, Grasshopper
License: Unknown
reconstruction graph
View Dataset →
📊

CBE Post-Occupancy Survey

💚 Open
Building Energy

Research data from Center for the Built Environment comparing WELL vs LEED building IEQ satisfaction scores. Free download.

Formats: CSV
By: Center for the Built Environment
License: Unknown
regression classification
View Dataset →
🌆

City-Level Open Permit Data

💚 Open
Procurement & Permits

Building permits from major US/global cities via open data portals: NYC, Chicago, Seattle, Los Angeles, San Francisco, Vancouver, Melbourne.

Formats: CSV, API
License: Government Open
regression classification
View Dataset →
🏛️

CMP Facade Database

💚 Open
Facades

606 rectified facade images with manual pixel-level annotations across diverse architectural styles from the Czech Technical University.

Size: 606 images
Formats: JPG, PNG
By: Czech Technical University Prague
License: Unknown
segmentation
View Dataset →
📝

CODE-ACCORD — Building Regulation NLP Dataset

💚 Open
Codes & Regulations

862 sentences, 4,297 entities, 4,329 relations from 33 regulatory documents (1,595 pages) annotated by 12 experts. England and Finland.

Size: 862 sentences, 4,297 entities
Formats: CSV, HuggingFace
By: ACCORD NLP Project
License: CC BY 4.0
nlp qa
View Dataset →
🔢

Construction Estimation Data

💚 Open
Procurement & Permits

Tabular construction cost and estimation records for training AI-driven cost estimation and bid analysis models.

Formats: CSV
License: Unknown
regression
View Dataset →
⛑️

Construction-PPE — Safety Equipment Detection

💚 Open
Construction & Safety

Real construction environment PPE compliance/non-compliance images in Ultralytics YOLO format for worker safety monitoring.

Formats: YOLO
By: Ultralytics
License: Unknown
detection
View Dataset →
👁️

ConstructionSite-10k

💚 Open
Construction & Safety

10,013 construction site images (7,009 train / 3,004 test) for testing vision-language models on construction inspection tasks.

Size: 10,013 images
Formats: Images, HuggingFace
License: Unknown
classification detection qa
View Dataset →
🌡️

Cool, Quiet City — Urban Comfort

💚 Open
Building Energy

Smartwatch data on residents' perceptions of urban noise and heat collected via the Cozie smartwatch platform. Kaggle ML competition.

Formats: CSV
By: Clayton Miller
License: Unknown
regression classification
View Dataset →
🔩

CSB — Cracks in Steel Bridges

💚 Open
Construction & Safety

Steel bridge images with pixel-wise fatigue crack annotations from Rijkswaterstaat and ProRail, covering cracks, corrosion, and defect-free structures.

Formats: Images, Segmentation masks
By: Rijkswaterstaat, ProRail, Nebest
License: CC BY 4.0
segmentation detection
View Dataset →
🏢

CubiCasa5K

💚 Open
Floor Plans & BIM

5,000 real-world floor plan images from Finnish real estate with 80+ annotation categories including rooms, doors, windows, walls, and stairs.

Size: 5,000 floor plans
Formats: PNG, SVG
By: CubiCasa Oy
License: CC BY 4.0
segmentation detection generation
View Dataset →
🏛️

Data.gov — Building Code & Zoning Datasets

💚 Open
Codes & Regulations

Building permits, zoning GIS data, development permits, property maintenance codes, land development codes, and state energy codes from the US government.

Formats: CSV, GIS, API
By: U.S. Government
License: Government Open
nlp regression
View Dataset →
🌱

EC3 — Embodied Carbon in Construction

💚 Open
Building Energy

150,000+ verified Environmental Product Declarations (EPDs). Largest open-access database for embodied carbon in building materials.

Size: 150,000+ EPDs
Formats: Web API, Database
By: Building Transparency
License: Unknown
regression classification
View Dataset →
🇺🇸

eCFR — Electronic Code of Federal Regulations

💚 Open
Codes & Regulations

All US federal regulations including HUD, ADA, OSHA, and fire codes in XML bulk format with continuous updates.

Formats: XML, HTML
By: U.S. Government Publishing Office
License: Government Open
nlp qa
View Dataset →
🗼

ECP Facade Database

💚 Open
Facades

104 rectified Hausmannian building images from Paris with 7 semantic classes (wall, window, sky, shop, balcony, door, roof, chimney).

Size: 104 images
Formats: JPG, PNG
By: Ecole Centrale Paris
License: Unknown
segmentation detection
View Dataset →
🏙️

eTRIMS Image Database

💚 Open
Facades

60 annotated non-rectified facade images with 4-class and 8-class annotation variants covering irregular patterns.

Size: 60 images
Formats: JPG, PNG
By: University of Bonn
License: Unknown
segmentation
View Dataset →
📋

FloorPlanCAD

💚 Open
Floor Plans & BIM

15,663 real-world CAD floor plans with fine-grained annotations for 30 object categories (doors, windows, furniture, equipment). ICCV 2021.

Size: 15,663 floor plans
Formats: PNG, Vector
By: Zhiwen Fan et al.
License: CC BY-NC 4.0
segmentation detection classification
View Dataset →
💎

Fragments — Open-Source BIM Format

💚 Open
Tools & Scripts

Performance-optimized binary BIM format rendering millions of objects with LOD. Reduces 2GB IFC files to ~200MB with geometry deduplication.

Formats: FRAG, IFC
License: MIT
reconstruction classification
View Dataset →
📄

Free Construction Contract Templates

💚 Open
Procurement & Permits

Basic construction contract templates (lump sum, cost-plus, time & materials, design-build) from TemplateLab, Mastt, Jotform, and eForms.

Formats: DOCX, PDF
License: Custom
nlp
View Dataset →
🌍

Global Procurement Dataset — 72M Contracts

💚 Open
Procurement & Permits

72 million contracts from 42 countries (2006–2021) with buyer/supplier info, geolocation, product classification, price, and corruption risk indicators.

Size: 72 million contracts
Formats: Structured dataset
License: CC BY 4.0
regression nlp classification
View Dataset →
🌍

GlobalBuildingAtlas

💚 Open
Urban & 3D Models

2.75 billion 3D building structures at 3-meter resolution derived from ~800,000 satellite images. Global coverage of footprints, heights, and LoD1 models.

Size: 2.75 billion buildings
Formats: GeoTIFF, GeoJSON, Shapefile
By: Prof. Xiaoxiang Zhu
License: CC BY 4.0
detection reconstruction regression
View Dataset →
🕸️

GraphRAG for Smart Buildings

💚 Open
Tools & Scripts

LLM + Knowledge Graph pipeline for smart building management using IFC data parsed into Neo4j graphs with ifcopenshell.

Formats: IFC, Neo4j, JSON
License: MIT
graph qa nlp
View Dataset →
🌉

GYU-DET — Multi-Defect Bridge Dataset

💚 Open
Construction & Safety

11,123 high-resolution bridge images with 6 defect types (cracks, spalling, seepage, honeycomb surface, exposed rebar, holes). Nature 2025.

Size: 11,123 images
Formats: Images, YOLO annotations
License: CC BY 4.0
detection
View Dataset →
🏺

Heritage Building Defect Detection

💚 Open
Construction & Safety

Defect detection dataset for heritage buildings with bounding box and segmentation mask annotations for inspection applications.

Formats: JPG, Annotations
License: Unknown
detection segmentation classification
View Dataset →
🏛️

Heritage Images & Point Clouds (Zenodo)

💚 Open
Tools & Scripts

Photogrammetric images and annotated point clouds for 5 heritage buildings following ArCH classification standards.

Size: 5 heritage buildings
Formats: Images, Point Cloud
License: CC BY 4.0
point-cloud segmentation reconstruction
View Dataset →
🏘️

HUD SOCDS — US Building Permits by Metro Area

💚 Open
Procurement & Permits

Building permit data for all US Metropolitan Areas, Central Cities, and Suburbs from the U.S. Dept. of Housing and Urban Development.

Formats: CSV, Web query
By: U.S. HUD
License: Government Open
regression
View Dataset →
🔧

Hybrid-CGAN — Synthetic Building Fault Data

💚 Open
Building Energy

Synthetic fault data for building fault detection using EnergyPlus simulation + GANs. 50% improvement in FID scores, classifier accuracy 0.82 → 0.94.

Formats: CSV
By: Jintong Han, Adrian Chong
License: Unknown
generation regression classification
View Dataset →
🔗

Hypergraph Floor Plans

💚 Open
Floor Plans & BIM

Nature Communications framework for automated floor plan generation using hypergraphs. Open-source Python code with graph data structures.

Formats: Python, Graph
By: Ramon Weber
License: MIT
generation graph
View Dataset →
🪟

HZNU Facade Dataset

💚 Open
Facades

624 high-resolution buildings with 43,277 annotated windows (avg 4056×3856 px) from Hangzhou, China. Includes homography matrix annotations.

Size: 624 buildings, 43,277 windows
Formats: JPG, JSON, TXT
By: HZNU (Hangzhou Normal University)
License: CC BY 4.0
detection segmentation
View Dataset →
📖

ICC Digital Codes — Full US Building Codes

💚 Open
Codes & Regulations

Full text of IBC 2024/2021, IRC, IFC (Fire), IMC, IPC, IECC and more. Searchable online free; REST API via Code Connect (paid).

Formats: HTML, PDF, JSON API
By: International Code Council
License: Custom
nlp qa
View Dataset →
🎨

IDSedit — Visual IDS Editor

💚 Open
Tools & Scripts

Node-based visual editor for Information Delivery Specifications (OpenBIM). Drag-and-drop IDS rule creation without XML coding.

Formats: IDS, Visual nodes
By: Louis Trümpler
License: MIT
classification nlp
View Dataset →
💬

IFC BIM QA Dataset

💚 Open
Codes & Regulations

13,485 question-answer pairs covering BIM and IFC domain knowledge for training/testing LLMs on BIM-specific queries.

Size: 13,485 Q&A pairs
Formats: HuggingFace text
License: Unknown
qa nlp
View Dataset →
🏷️

IFC ML Classification — 3D Object Recognition

💚 Open
Tools & Scripts

ML pipelines for automated 3D IFC object classification achieving ~98% accuracy using GNN, Stacked RF, and Deep Learning.

Formats: IFC, Python, Graph
License: Unknown
classification graph
View Dataset →

IFC Model Checker — Open-Source QA

💚 Open
Tools & Scripts

Browser-based IFC validation using IDS standard and IfcOpenShell. Validates IFC models against Information Delivery Specifications.

Formats: IFC, IDS, HTML
License: MIT
classification qa
View Dataset →
💬

IFC-Bench — LLM-based IFC QA Benchmark

💚 Open
Tools & Scripts

21 IFC model projects with 1,027 QA pairs for testing LLMs on natural language queries to IFC building information retrieval.

Size: 21 projects, 1,027 QA pairs
Formats: IFC, JSON
By: Sylvain Hellin et al.
License: Unknown
qa nlp
View Dataset →
🦆

ifc2duckdb — IFC to SQL Database

💚 Open
Tools & Scripts

Converts IFC BIM files to DuckDB for high-performance sub-second SQL queries on building data. Open-source hackathon project.

Formats: IFC, DuckDB, Python
License: Unknown
classification graph
View Dataset →
🌿

IfcLCA — Embodied Carbon from BIM

💚 Open
Building Energy

Free, open-source browser-based embodied carbon calculator. Converts IFC models to CO2-equivalent data using Swiss national carbon databases.

Formats: IFC, Web
By: Louis Trümpler
License: MIT
regression
View Dataset →
🔵

IFCNet — 19,000 BIM Entity Models

💚 Open
Tools & Scripts

~19,000 CAD models across 65 IFC classes (IFCNetCore: 7,930 objects, 20 classes) extracted from ~1,000 IFC models for BIM entity classification.

Size: ~19,000 CAD models
Formats: IFC, PNG renders
By: RWTH Aachen University
License: Unknown
classification detection graph
View Dataset →
🇮🇪

Ireland Planning Database

💚 Open
Procurement & Permits

National Planning Application Database with spatial and tabular data. Dublin City Council applications from 2003 to present.

Formats: CSV, Shapefile, API
By: data.gov.ie
License: Government Open
regression classification
View Dataset →
🌐

IRFs — Irregular Facades

💚 Open
Facades

1,057 high-quality facade images from 104 countries (1895–2023) with 6 segmentation classes: Background, Plant, Wall, Window, Door, Fence.

Size: 1,057 images
Formats: JPG, PNG, JSON
License: Unknown
segmentation
View Dataset →
🏠

KAAN Dataset

💚 Open
Floor Plans & BIM

800+ Dutch apartment IFC models from real housing projects with annotated floor plans (WKT), material data, and spatial graphs.

Size: 800+ apartment units
Formats: IFC, CSV, WKT
By: Ali Khatami
License: Unknown
classification graph reconstruction
View Dataset →
🏗️

LabelMeFacade

💚 Open
Facades

Highly irregular and diverse building facade images from the LabelMe segmentation dataset, challenging standard segmentation approaches.

Formats: Images, Annotations
License: Unknown
segmentation
View Dataset →
♻️

LCAx — Lifecycle Assessment Data Exchange

💚 Open
Building Energy

Open standard for exchanging LCA results and EPD data across software platforms, born from Denmark's 2023 building regulation on LCA submissions.

Formats: JSON
License: Unknown
regression
View Dataset →
📚

LLM-Knowledge-Pool-RAG

💚 Open
Tools & Scripts

Learning resource and implementation guide for architects building RAG (Retrieval-Augmented Generation) systems with vector databases.

Formats: Python, Vector DB
License: Unknown
nlp qa
View Dataset →
📊

LUMO Benchmark — Outdoor Vibration Monitoring

💚 Open
Construction & Safety

9m lattice mast structure with 18 reversible damage cases across 6 levels. Accelerometers, strain gauges, and temperature sensors for SHM research.

Formats: Time series, CSV
By: Leibniz University Hannover
License: CC BY 4.0
regression classification
View Dataset →
🚁

MBDD2025 — Building Surface Defects (UAV)

💚 Open
Construction & Safety

14,471 UAV-collected building images across 6 structure types (steel, concrete, wood, brick) with defects including cracks, leakage, and corrosion.

Size: 14,471 images
Formats: Images, Annotations
License: CC BY 4.0
detection classification
View Dataset →
📍

MIT Places365 (Building Subset)

💚 Open
Architecture Images

1.8M images (Standard) to 8M (Challenge) across 365 scene categories including many building-related categories. Download available on Kaggle and official.

Size: 1.8M–8M images
Formats: JPG, TXT, TAR
By: MIT CSAIL
License: Custom
classification
View Dataset →
🇨🇱

MLSTRUCT-FP — 954 Chilean Floor Plans

💚 Open
Floor Plans & BIM

954 high-resolution multi-unit residential floor plans from 165 Chilean projects with wall and slab polygon annotations.

Size: 954 floor plans
Formats: Images, Polygon annotations
By: MLSTRUCT team
License: MIT
segmentation detection
View Dataset →
📸

Modern Architecture — 100K Images

💚 Open
Architecture Images

~100,000 modern architecture building photographs on Kaggle. No labels or annotations — raw photography dataset.

Size: ~100,000 images
Formats: JPG
License: Unknown
classification generation
View Dataset →
🗺️

Modified Swiss Dwellings

💚 Open
Floor Plans & BIM

~16,800 annotated apartment-level floor plans with rich graph annotations covering windows, doors, orientation, and 22 room subtypes.

Size: ~16,800 floor plans
Formats: Pickle, CSV, WKT
By: Casper van Engelenburg et al.
License: Unknown
classification generation graph
View Dataset →
📐

MSD — Floor Plan Generation Benchmark

💚 Open
Floor Plans & BIM

Benchmark dataset for floor plan generation of building complexes, published at ECCV 2024. Available on Kaggle, GitHub, and arXiv.

Formats: PNG, SVG, CSV
By: Casper van Engelenburg et al.
License: Unknown
generation classification
View Dataset →
🗺️

National Zoning Atlas — US Zoning Data

💚 Open
Codes & Regulations

Standardized zoning data covering 200+ regulatory characteristics across the US. Most comprehensive national zoning dataset.

Formats: GIS, Web map
License: Unknown
nlp regression
View Dataset →
🔥

NFPA Free Access — 300+ Fire Codes

💚 Open
Codes & Regulations

Full text of NFPA 101 (Life Safety), NFPA 72 (Fire Alarm), NFPA 13 (Sprinkler), and 300+ fire codes via NFPA LiNK.

Formats: HTML
By: National Fire Protection Association
License: Custom
nlp qa
View Dataset →
🌐

OCDS Registry — Global Public Procurement

💚 Open
Procurement & Permits

100+ country procurement datasets in standardized Open Contracting Data Standard covering planning, tender, award, contract, and implementation stages.

Size: 100+ country datasets
Formats: JSON, CSV
By: Open Contracting Partnership
License: CC BY 4.0
nlp regression
View Dataset →
🗽

Open City Model — 125M 3D Buildings (USA)

💚 Open
Urban & 3D Models

~125 million 3D building geometries across the entire US derived from USBuildingFootprints, available on AWS S3.

Size: 125 million buildings
Formats: CityGML, CityJSON, Parquet
License: Apache 2.0
reconstruction detection
View Dataset →
🤖

OpenBIM MCP Server — AI + BIM Integration

💚 Open
Tools & Scripts

Model Context Protocol server connecting AI assistants to OpenBIM data. Enables AI to query, reason, and answer questions about IFC models.

Formats: IFC, MCP, Python
License: MIT
qa classification graph
View Dataset →
🔩

OpenBIMtoFEM — BIM to Structural Analysis

💚 Open
Tools & Scripts

Framework converting OpenBIM IFC models to Finite Element Method analysis meshes. Python-based IFC to FEM pipeline.

Formats: IFC, FEM, Python
License: Unknown
reconstruction regression
View Dataset →
📚

OpenConstruction-Datasets — 51+ Dataset Catalog

💚 Open
Construction & Safety

Systematic catalog of 51+ open-access visual datasets for construction AI covering safety, quality, progress, equipment across multiple modalities.

Size: 51+ datasets
Formats: JSON metadata
By: Ruoxin Xiong et al.
License: MIT
detection segmentation
View Dataset →
📁

OSArch Example Files — Open BIM Samples

💚 Open
Tools & Scripts

Curated collection of high-quality BIM files in IFC2X3, IFC4, and IFC4x3 covering architectural, structural, and MEP models.

Formats: IFC
By: OSArch Community
License: CC BY 4.0
reconstruction classification
View Dataset →
🎨

Pix2Pix Facades Dataset

💚 Open
Facades

400 facade photographs paired with hand-labeled semantic maps (walls, windows, doors). The classic pix2pix training dataset.

Size: 400 paired images
Formats: JPG
License: Unknown
generation segmentation
View Dataset →
🎡

Planning London Datahub

💚 Open
Procurement & Permits

Live planning application and development proposal data from all London boroughs, updated daily by the Greater London Authority.

Formats: Web, API
By: Greater London Authority
License: Government Open
regression classification
View Dataset →
🏴󠁧󠁢󠁥󠁮󠁧󠁿

planning.data.gov.uk — UK Planning Applications

💚 Open
Procurement & Permits

National planning and housing data from all English Local Planning Authorities with map view, search, and download. CSV, JSON, GeoJSON.

Formats: CSV, JSON, GeoJSON
By: UK Ministry of Housing
License: Government Open
regression classification
View Dataset →
🏫

Purdue PTBC — Building Code NLP Corpus

💚 Open
Codes & Regulations

Part-of-speech tagged building code corpus used to train Bi-LSTM RNN with BERT for building code NLP. 95.11% precision POS tagger.

Formats: Annotated text
By: Purdue University
License: Unknown
nlp classification
View Dataset →
🐍

PythonForIFC — 12+ BIM Scripts

💚 Open
Tools & Scripts

12+ Python utilities for IFC file manipulation and BIM workflow automation. Open-source scripts for common IFC operations.

Formats: Python, IFC
By: Louis Trümpler
License: MIT
reconstruction classification
View Dataset →
📐

QTOpro — IFC Quantity Takeoff Tool

💚 Open
Tools & Scripts

Browser-based IFC analysis for construction quantity takeoff. No uploads or installation required. Extracts quantities directly from IFC for cost estimation.

Formats: IFC, CSV, Tables
By: Louis Trümpler
License: Unknown
regression reconstruction
View Dataset →
🔄

ResBIM — Synthetic BIM Pairs

💚 Open
Floor Plans & BIM

1,000+ paired samples each containing a parametric 3D Revit BIM model and annotated 2D floor plan, for BIM automation research.

Size: 1,000+ paired samples
Formats: RVT, Annotated 2D plans
By: RogerLiang0725
License: Unknown
reconstruction generation
View Dataset →
📊

ResPlan — 17,000 Residential Floor Plans

💚 Open
Floor Plans & BIM

17,000 vector-graph floor plans with precise wall, door, window, and functional space annotations. Graph representations with NetworkX compatibility.

Size: 17,000 floor plans
Formats: Pickle, Vector, Graph
By: M. Agour et al.
License: MIT
generation graph
View Dataset →
📋

RFPDB.com — Architecture RFP Database

💚 Open
Procurement & Permits

Government, for-profit, and non-profit RFPs with a dedicated architecture category. Free listings with no subscription required.

Formats: Web listings
License: Custom
nlp
View Dataset →
🖤

ROBIN — 510+ B&W Floor Plans

💚 Open
Floor Plans & BIM

510 black-and-white architectural floor plans plus 122 scanned documents (ROBIN++) for document analysis and automatic retrieval.

Size: 510 floor plans
Formats: Images, ZIP
By: Sharma, Gupta et al.
License: Unknown
classification detection
View Dataset →
🏠

RooFormer — 3D Roof Reconstruction

💚 Open
Urban & 3D Models

Deep learning model reconstructing 3D roof models from high-resolution aerial/satellite imagery for automatic LoD2 building generation.

Formats: Images, 3D geometry
By: NUS Urban Analytics Lab
License: Unknown
reconstruction detection
View Dataset →
🏘️

RPLAN — 80,788 Asian Residential Floor Plans

🔶 Request
Floor Plans & BIM

80,788 densely annotated floor plans from real Asian residential buildings with room types, wall distinctions, and door types. Used by HouseGAN.

Size: 80,788 floor plans
Formats: PNG, Raster
By: Wu et al.
License: Custom
generation classification
View Dataset →
🏙️

rrustom/architecture2022clean

💚 Open
Architecture Images

Architecture image dataset (2022) on HuggingFace with Parquet metadata format.

Formats: JPG, PNG, Parquet
License: Unknown
classification generation
View Dataset →
🏫

S3DIS — Stanford Indoor Spaces

🔶 Request
Urban & 3D Models

6 areas, 3 buildings, ~270 rooms with 13 semantic categories per point (ceiling, floor, wall, beam, column, window, door, furniture).

Size: 6,020 m², ~270 rooms
Formats: Point Cloud, XYZ, RGB
By: Stanford University
License: Custom
point-cloud segmentation classification
View Dataset →
🦅

SAM.gov — US Federal Contract Opportunities

💚 Open
Procurement & Permits

All US federal government contract opportunities (RFPs, solicitations, awards) for all agencies. REST API, CSV bulk downloads, PostgreSQL snapshots.

Formats: JSON, CSV, PostgreSQL
By: U.S. GSA
License: Government Open
nlp regression
View Dataset →
🧱

SDNET2018 — 56,000 Concrete Crack Images

💚 Open
Construction & Safety

56,000+ images (256×256 px) of cracked and non-cracked concrete from bridge decks, walls, and pavements. Crack widths 0.06mm–25mm.

Size: 56,000+ images
Formats: JPEG
By: Marc Maguire et al.
License: CC BY 4.0
classification detection segmentation
View Dataset →
🔷

SESYD — 1,000 Synthetic Vector Floor Plans

💚 Open
Floor Plans & BIM

1,000 synthetic floor plans (16 architectural symbol models). Extended SFPI version: 10,000 images with ~300,000 furniture items.

Size: 1,000–10,000 floor plans
Formats: SVG, EPS, PNG
By: Delalandre et al.
License: Unknown
detection classification
View Dataset →
🏗️

SF Building Permits — 200K+ Records

💚 Open
Procurement & Permits

200,000+ permits (5 years) to 1.1M+ records (1980-2019) with estimated cost, description, permit type, location. Weekly updates from DataSF.

Size: 1.1M+ records
Formats: CSV, API
By: City of San Francisco
License: Government Open
regression classification
View Dataset →
🔨

Shovels.ai — 180M+ US Building Permits

🔒 Paid
Procurement & Permits

180M+ AI-enriched permits across 30M US addresses with inspection pass rates, contractor profiles, and 85% US population coverage.

Size: 180M+ permits
Formats: JSON, Parquet, Snowflake
By: Shovels Inc.
License: Custom
regression classification
View Dataset →
🏗️

SODA — 20K+ Construction Site Images

💚 Open
Construction & Safety

20,000+ images with 15 object classes covering workers, materials, machines, and layouts from multiple construction sites across conditions.

Size: 20,000+ images
Formats: Images, Bounding boxes
License: Unknown
detection classification
View Dataset →
🌐

SpatialLM — 3D LLM for Point Clouds

💚 Open
Tools & Scripts

Novel 3D large language model for processing point cloud data with architectural spatial understanding and natural language output.

Formats: Point Cloud, Python
License: Unknown
point-cloud qa classification
View Dataset →
🚶

SPECS — Streetscape Perception Dataset

💚 Open
Urban & 3D Models

Demographically balanced global urban visual perception dataset from a 1,000-person survey across multiple cities. Published in Nature Cities.

Formats: CSV, Images
By: NUS Urban Analytics Lab
License: Unknown
regression classification
View Dataset →
🏗️

StructuralCodes — Engineering Calculations

💚 Open
Tools & Scripts

Python library for structural engineering calculations following Eurocode and fib Model Code standards.

Formats: Python
By: fib International
License: MIT
regression
View Dataset →
🏘️

SYNBUILD-3D

💚 Open
Urban & 3D Models

6.2 million synthetic residential buildings at LOD-4 with 3D wireframe graphs, floor plan images, and LiDAR-like roof point clouds.

Size: 6.2 million buildings
Formats: Graph, PNG, Point Cloud
By: Kevin Mayer
License: Unknown
generation regression reconstruction
View Dataset →
🏠

Synthetic Floor Plans (Figshare)

💚 Open
Floor Plans & BIM

2,500 synthetic single-family floor plans from T0 (2 rooms) to T4 (10 rooms) typologies, in black-and-white and color-coded versions.

Size: 2,500 images
Formats: PNG, JPG
License: CC BY 4.0
generation classification
View Dataset →
🌡️

TBBR — Thermal Bridges on Building Rooftops

💚 Open
Building Energy

926 annotated images (68.5 GB) from 6 UAV flights with 5 channels (RGB + thermographic + height). 6,927 thermal bridge annotations.

Size: 926 images, 68.5 GB
Formats: NumPy, COCO JSON
By: Karlsruhe researchers
License: CC BY 4.0
detection segmentation regression
View Dataset →
🇪🇺

TED — EU Public Procurement (TED)

💚 Open
Procurement & Permits

ALL EU public procurement notices since 1993 — the world's largest procurement dataset. REST API, XML bulk downloads, CSV subsets.

Formats: JSON, XML, CSV
By: EU Publications Office
License: Government Open
nlp regression classification
View Dataset →
🖼️

terminusresearch/photo-architecture

💚 Open
Architecture Images

High-resolution building and unique architecture images on HuggingFace with Parquet metadata. Full-resolution architectural photography.

Formats: JPG, PNG, Parquet
License: Unknown
generation classification
View Dataset →
🔷

TopologicPy — 3D Geometric Modeling

💚 Open
Tools & Scripts

Open-source 3D topological modeling library integrated with Fragments for BIM. Python-based 3D topology data with IFC integration.

Formats: Python, 3D topology, IFC
License: GPL-3.0
reconstruction graph point-cloud
View Dataset →
🔭

TUM-FACADE — Point Cloud Benchmark

💚 Open
Facades

33 annotated building facades with ~333 million annotated LiDAR points. Semantic segmentation: windows, doors, balconies, moldings.

Size: 33 facades, 333M points
Formats: LAS, LAZ, CSV
By: TU Munich
License: CC BY 4.0
point-cloud segmentation reconstruction
View Dataset →
🚁

UAVID3D — UAV Building Reconstruction

💚 Open
Urban & 3D Models

21GB of UAV RGB and thermal imagery for 3D building reconstruction and thermal anomaly detection, including 3D meshes.

Size: 21GB
Formats: JPG, TIFF, PLY
License: CC BY 4.0
reconstruction segmentation detection
View Dataset →
🔍

UK Contracts Finder

💚 Open
Procurement & Permits

UK government procurement opportunities and awarded contracts in OCDS format. API and CSV bulk downloads via data.gov.uk.

Formats: JSON, CSV
By: UK Crown Commercial Service
License: Government Open
nlp regression
View Dataset →
🇬🇧

Uniclass 2015 — UK Construction Classification

💚 Open
Codes & Regulations

Unified classification system for UK construction covering products, systems, activities, and spaces. Used for BIM and specification classification.

Formats: Classification tables, Download
By: NBS / CPIC
License: Custom
nlp classification
View Dataset →
⬆️

UpCodes — US Building Codes (80+ Jurisdictions)

💚 Open
Codes & Regulations

80+ US jurisdictions, 190K+ local amendments, 6M+ code sections with AI Copilot for code research.

Size: 6M+ code sections
Formats: Web platform
By: UpCodes Inc.
License: Custom
nlp qa
View Dataset →
💵

USAspending.gov — US Federal Spending

💚 Open
Procurement & Permits

All US federal spending on contracts, grants, and awards back to FY2001. Filterable by NAICS codes (23xxxx for construction). REST API and CSV.

Formats: JSON, CSV, PostgreSQL
By: U.S. Department of Treasury
License: Government Open
regression nlp
View Dataset →
🏙️

VoxCity — 3D City Model Generator

💚 Open
Urban & 3D Models

Open-source Python package for automated 3D city model generation from OpenStreetMap. Outputs voxel grids, GeoJSON, building heights, and vegetation data.

Formats: Python, GeoJSON, Voxel
By: NUS Urban Design Centre
License: MIT
reconstruction regression
View Dataset →

WikiChurches — Fine-Grained Architectural Styles

💚 Open
Architecture Images

9,485 church images with 631 bounding box annotations and architectural style labels from Wikipedia. NeurIPS 2021 Datasets & Benchmarks.

Size: 9,485 images
Formats: JPG, CSV, XML
License: CC BY 4.0
classification detection
View Dataset →
🌐

World Bank Contract Awards

💚 Open
Procurement & Permits

Contract awards for World Bank-funded IDA/IBRD projects worldwide with Global Public Procurement Database on country procurement systems.

Formats: CSV, Excel, API
By: World Bank Group
License: CC BY 4.0
regression nlp
View Dataset →
🛰️

xBD — Satellite Building Damage Assessment

🔶 Request
Construction & Safety

5,598 satellite image pairs (1024×1024) across 11 disaster events with 4-level damage labels covering hurricanes, wildfires, floods, and earthquakes.

Size: 5,598 image pairs
Formats: Satellite images, GeoJSON
By: MIT Lincoln Laboratory
License: Custom
detection classification segmentation
View Dataset →
📹

YOLOv11 Construction Monitoring

💚 Open
Construction & Safety

Construction site monitoring combining YOLOv11 object detection with Autodesk ACC for automated progress tracking and safety monitoring.

Formats: Video, YOLO annotations
License: Unknown
detection classification
View Dataset →
📡

Z24 Bridge — SHM Benchmark

💚 Open
Construction & Safety

1 year of continuous accelerometer monitoring from Z24 highway bridge with progressive controlled damage. The most popular SHM benchmark.

Size: 1 year continuous data
Formats: MAT, CSV
By: KU Leuven
License: Unknown
regression classification
View Dataset →
☁️

ZAHA — Large-Scale Point Cloud Facades

💚 Open
Facades

601 million annotated points in 5 and 15 class variants for large-scale facade semantic segmentation. Published at WACV 2025.

Size: 601M annotated points
Formats: LAS, LAZ, CSV
License: CC BY 4.0
point-cloud segmentation
View Dataset →
🏡

ZInD — 1,524 Homes with Floor Plans

🔶 Request
Floor Plans & BIM

71,474 panoramas from 1,524 real unfurnished US homes with annotated 2D/3D floor plans from 20 US cities. CVPR 2021.

Size: 71,474 panoramas, 1,524 homes
Formats: 360° Panoramas, JSON
By: Steve Cruz et al.
License: Custom
reconstruction detection
View Dataset →

Every project smarter than before.
Own your data.

Let's find the AI strategy that works for you.

Book Free Discovery Call →

Spotted an error? Suggest a correction →