Digital Atlas Hong Kong — Method, Scale & Context of Usage
Comprehensive mathematical representation of Hong Kong's urban commercial structure across 292 planning units, 10,928 hexagonal neighborhoods, and 147,191 places, integrating 20 data sources including satellite imagery, census data, and business records.
Digital Atlas Hong Kong — Method, Scale & Context of Usage#
What this is#
A mathematical representation of Hong Kong's urban commercial structure. Every planning unit, every neighborhood, and every commercial place in the territory described by a structured feature vector — capturing what it is, what surrounds it, who needs it, and how it's changing.
Built on atlas-1 at /home/azureuser/digital-atlas-hkg/. Total footprint: 6.2 GB.
Scale#
Territory coverage#
- Area: 1,112 km² (292 Tertiary Planning Units)
- Population: 7.47 million (WorldPop corrected + Census 2021)
- Commercial places: 147,191 (Overture Maps + OpenStreetMap, deduplicated)
- Chain brands: 151 detected, 10,474 outlets
- Data sources: 20 independent layers
- Temporal depth: VIIRS nightlights 2023–2025 (37 months), GHSL built-up 2010/2020/2025
Spatial resolution#
| Level | Units | Features | Grain | Role |
|---|---|---|---|---|
| TPU | 292 | 538 | Planning unit (variable size) | Primary analytical unit — census-aligned, archetype-classified |
| Hex-8 | 1,896 | 486 | ~460m edge (~0.74 km²) | Neighborhood — boundary/gateway/cluster detection |
| Hex-9 | 10,928 | 205 | ~175m edge (~0.105 km²) | Street-level — satellite, terrain, land use, building structure |
| Places | 147,191 | 142 | Individual business | Micro-context — competition, demand pull, synergy, anchors |
Feature count: 1,371 total across all levels#
Method#
Data acquisition (20 sources)#
| Source | What | Records |
|---|---|---|
| Overture Maps 2026-04 | Places, buildings, roads, land use, water, infrastructure, divisions | 1.08M features |
| OpenStreetMap (34 layers) | MTR stations, bus stops, restaurants, schools, hospitals, parks, etc. | 46,728 POIs |
| Census 2021 (C&SD) | Population, age, sex, income, occupation, housing, commute by STPU | 274 × 208 columns |
| CSDI Planning Dept | 292 TPU boundary polygons | 292 polygons |
| NASA VIIRS Black Marble | Monthly nighttime radiance composites | 37 months |
| GHSL (EU JRC) | Built-up surface, population, building height, urbanization class | 2010/2020/2025 epochs |
| Copernicus GLO-30 | Digital elevation model at 30m | 2 tiles |
| ESA WorldCover 2021 | 10m land cover (11 classes) | 2 tiles |
| WorldPop 2020 | 100m population grid (corrected ×0.593) | Territory-wide |
| Hansen/UMD | Global forest change (treecover, loss, gain) | 30m |
| Sentinel-2 | Cloud-free optical composite (NDVI) | 6 scenes medianed |
| Meta HRSL | 30m demographic strata (7 age/gender layers) | Territory-wide |
| MTR opendata | Station routes, fares, LRT | 5 CSVs |
| KMB ETA API | 6,715 bus stops + routes | 3 JSONs |
| FEHD (data.gov.hk) | 17,188 licensed restaurant premises | F&B ground truth |
| EDB (data.gov.hk) | 3,486 schools with lat/lng | Education anchor |
| data.gov.hk bulk | Census, property, buildings, schools, health, transit | 118+ files |
| Serper.dev SERP | 161 HK mall tenant directory URLs | 161 malls |
| OSM HK PBF | Full territory OSM extract | 41 MB |
| Overture Divisions | SAR, districts, localities, microhoods | 441 features |
Region representation pipeline#
Phase 1: Boundaries + grid - 292 TPU polygons from CSDI - H3 resolution-9 hex grid clipped to land (10,928 cells, 1,124 km²) - H3 resolution-8 parent aggregation (1,896 cells) - Hex → TPU spatial assignment
Phase 2: Structural features - Overture buildings (302K) → per-hex height/area/count - Overture roads (452K segments) → per-hex length/density by class - Overture land use (40K) → per-hex zoning composition (% residential/commercial/park) - WorldCover → per-hex land cover binary flags (tree/built/water) - DEM → elevation, hillside flag, terrain ruggedness - Coastline → coast proximity, waterfront flag
Phase 3: Satellite intelligence - VIIRS nightlights → per-hex radiance mean, per-capita, commercial indicator, growth corridors, decline zones, YoY change - GHSL → built-up surface 2010/2020/2025 (age proxy), population, building height, urbanization class - Hansen → treecover, forest loss/gain - Sentinel-2 → NDVI vegetation index
Phase 4: Place composition - 147K places spatially joined to hex-9 and TPU - Per-region: L1 counts (11 categories), L2 counts (30 categories), top-20 fine categories - Category entropy, cuisine diversity (22 types), chain ratio, brand penetration
Phase 5: Census demographics - 206 features from Census 2021 STPU (166/292 TPUs matched) - Age brackets, sex ratio, income by occupation, dwelling type, commute patterns, ethnicity, language, literacy
Phase 6: Verticality - Estimated floors (GHSL height / 3.5m) - FAR proxy (GFA / hex area) - Stacking intensity (height / built fraction) - Podium-tower signal, skyline prominence - Industrial conversion detection
Phase 7: Proxy features - Daytime population (office×15 + retail×8 + F&B×5 + hotel×20 + school×25) - Tourism intensity (hotels + attractions + landmarks) - Income proxy (building height + nightlight + services) - Footfall proxy (transit + places + nightlight + pop) - Night economy (bars + hospitality + nightlight) - Pedestrian connectivity, mixed-use index, green ratio
Phase 8: Graph features (TPU only) - Queen contiguity adjacency (avg 4.9 neighbors) - Spatial lag + contrast for 12 core features - Ring-1 neighbor sums
Phase 9: Composite scores + indices - Vitality, accessibility, demand, competition, growth potential, saturation - Skyline, redevelopment pressure, verticality, compactness, mixed-height
Phase 10: Archetype clustering - K-means (k=5) on 52 normalized features - Tourist/Entertainment (12 TPUs), Dense Residential (30), Suburban Residential (31), New Development (80), Country/Green Belt (91)
Phase 11: Hex-8 neighborhood layer - Aggregated hex-9 features upward - Broadcast TPU features downward (265 features) - Neighbor influence: boundary/gateway/cluster center flags, interface score, gradient position, net demand flow
Place representation pipeline#
Step 1: Source extraction - Overture Maps 141K places + OSM 47K POIs
Step 2: Conflation + dedup - Spatial join (50m radius) + name similarity - 20K net-new from OSM after deduplication
Step 3: Category taxonomy - L1: 11 groups (food, retail, services, health, education, leisure, hospitality, transit, community, auto, other) - L2: 30 groups (cafe_coffee, grocery, general_dining, bar_nightlife, etc.)
Step 4: Chain detection - 151 known HK chain brands matched by name - 10,474 outlets detected (McDonald's 880, Starbucks 729, 7-Eleven 428, etc.)
Step 5: Spatial assignment - H3 res-9 + res-8, TPU + district via point-in-polygon
Step 6: Competition - cKDTree spatial self-join per L2 category - Same-category count within 200m and 500m - Nearest competitor distance
Step 7: Complementary - Cross-category count within 300m using per-L1 KD-trees - Complementary diversity (unique L1 categories within 300m)
Step 8: Anchor proximity - 16 anchor types from OSM (MTR, schools, hospitals, hotels, malls, supermarkets, parks, bus stops, etc.) - Per-anchor: count within radius + nearest distance + boolean flag - Composite anchor_score (weighted across all types)
Step 9: Demand pull - 6 demand sources: office (nightlight-based), residential (WorldPop 500m), transit (MTR decay), hotel, school, mall - Distance-decay weighted (exponential, halflife 200m) - Composite demand_context_score
Step 10: Co-location synergy - 10 category-pair synergies, each fires ONLY for the target category - cafe×office, grocery×residential, convenience×transit, restaurant×hotel, gym×cafe, pharmacy×clinic, bar×restaurant, school×tutoring, bank×office, bakery×cafe
Step 11: Building context - 12 features from host hex-9: height, floors, FAR, podium, stacking, GHSL height, built fraction, land use, terrain, waterfront, hillside
Step 12: Neighborhood character - 14 features broadcast from parent TPU: archetype, vitality, demand, competition, accessibility, population density, daytime ratio, income proxy, cuisine diversity, nightlight trend, new development
Context of usage#
Site selection — "Where should brand X open next?"#
Query the place table for gaps: high demand_context + low competitors_200m + right archetype. The feature vector tells you not just WHERE but WHY — which demand source (office workers? residents? tourists?) drives the opportunity.
Portfolio analytics — "How diversified is this REIT's portfolio?"#
Map each property to a TPU archetype. Measure concentration risk across the 5 archetypes. Flag properties in decline zones (nl_growth_pct < -20%) or high-redevelopment areas (idx_redevelopment_pressure > 0.7).
Competitive landscape — "Who are my competitors and what's around them?"#
For any place: competitors_200m gives direct competition count, nearest_competitor_m gives breathing room, complementary_diversity shows ecosystem richness. Compare against archetype averages to know if competition is above or below normal for that neighborhood type.
Catchment analysis — "Who is my customer?"#
pull_residential tells you how many residents are nearby. pull_office tells you if office workers drive demand. char_daytime_ratio reveals if the area is office-dominant (>1) or residential-dominant (<1). Census features give age/income/ethnicity breakdown.
Growth corridor detection — "Where is the city growing?"#
nl_growth_corridor flags hexes brightening >20%. proxy_new_development shows GHSL built-surface change. score_growth_potential combines both with low-competition signal. Track these quarterly via VIIRS monthly updates.
Neighborhood scoring — "Is this a walkable 15-minute neighborhood?"#
Hex-8 level: r8_walkability combines transit_score + connectivity + places + anchors. Check osm_*_dist_m for specific amenity distances. r8_residential_quality combines green + parks + schools + low density.
Micrograph construction — "Star diagram for any place"#
Every place has the 4-arm context: T1 (transit_score), competitors (competitors_200m), demand magnets (complementary_*), anchor quality (anchor_score). Plus building context for vertical dimension.
Urban planning — "Where is redevelopment pressure highest?"#
idx_redevelopment_pressure = old buildings (bld_age_index) + low-rise (bld_lowrise_ratio) + high vitality (score_vitality). Cross with census_oq_* (housing tenure) to identify public housing renewal candidates.
Validation summary#
| Test | Result |
|---|---|
| Sanity checks (41 tests) | 41/41 pass (100%) |
| Deep validation (30 tests) | 27/30 pass (90%) |
| Place logic checks (13 tests) | 13/13 pass (100%) |
| Demand pull tests (6 tests) | 6/6 pass (100%) |
| Census-WorldPop correlation | r=0.96 |
| Population gap vs official | 1.9% |
| Place deduplication | 0 remaining duplicates |
| GHSL temporal monotonicity | 0 violations (2010 ≤ 2020 ≤ 2025) |
| Archetype spot checks | All 5 known locations correct |
| F&B vs FEHD ground truth | 2.46x ratio (expected 2-3x) |
Files on atlas-1#
/home/azureuser/digital-atlas-hkg/
data/outputs/
tpu_features_final.parquet 292 × 538 1.0 MB
h3_res8_features.parquet 1,896 × 486 2.3 MB
data/hex_v9/
hkg_hex_v9_features_v4.parquet 10,928 × 205 6.6 MB
data/places_consolidated/
hkg_places_final.parquet 147,191 × 142 32.4 MB
data/boundaries/
hkg_tpu.geojson 292 TPU polygons 38 MB
hkg_districts.geojson 19 districts 470 KB
hkg_sar.geojson territory 84 KB
hkg_hex_v9_land.geojson 10,928 hex-9 polys 6.2 MB
data/serving/ API-ready JSON 829 MB
tpu.json, hex8.json, hex9.json, places.json, places_slim.json
tpu_geo.geojson, hex8_geo.geojson, hex9_geo.geojson
feature_catalog.json, archetypes.json, places_schema.json
places_methodology.json, places_examples.json, places_stats.json
place_representation.json, manifest.json, models/
model/
v7_gap_model.pkl R²=0.923 3.4 MB
v8_population_model.pkl R²=0.819 7.8 MB
docs/
HKG_DATASET_OVERVIEW.html single-page summary
HKG_ATLAS_COMBINED.html 7-tab full report
PLACES_LAYER.md places documentation
+ 7 topic-specific HTML reports