Kosha
Reports digital-atlas-hong-kong-method-scale-context-of-usage-73b45a

Digital Atlas Hong Kong — Method, Scale & Context of Usage

Comprehensive mathematical representation of Hong Kong's urban commercial structure across 292 planning units, 10,928 hexagonal neighborhoods, and 147,191 places, integrating 20 data sources including satellite imagery, census data, and business records.

md 3h ago 12.2 KB HKG_DIGITAL_ATLAS.md
View source Download

Digital Atlas Hong Kong — Method, Scale & Context of Usage#

What this is#

A mathematical representation of Hong Kong's urban commercial structure. Every planning unit, every neighborhood, and every commercial place in the territory described by a structured feature vector — capturing what it is, what surrounds it, who needs it, and how it's changing.

Built on atlas-1 at /home/azureuser/digital-atlas-hkg/. Total footprint: 6.2 GB.


Scale#

Territory coverage#

  • Area: 1,112 km² (292 Tertiary Planning Units)
  • Population: 7.47 million (WorldPop corrected + Census 2021)
  • Commercial places: 147,191 (Overture Maps + OpenStreetMap, deduplicated)
  • Chain brands: 151 detected, 10,474 outlets
  • Data sources: 20 independent layers
  • Temporal depth: VIIRS nightlights 2023–2025 (37 months), GHSL built-up 2010/2020/2025

Spatial resolution#

Level Units Features Grain Role
TPU 292 538 Planning unit (variable size) Primary analytical unit — census-aligned, archetype-classified
Hex-8 1,896 486 ~460m edge (~0.74 km²) Neighborhood — boundary/gateway/cluster detection
Hex-9 10,928 205 ~175m edge (~0.105 km²) Street-level — satellite, terrain, land use, building structure
Places 147,191 142 Individual business Micro-context — competition, demand pull, synergy, anchors

Feature count: 1,371 total across all levels#


Method#

Data acquisition (20 sources)#

Source What Records
Overture Maps 2026-04 Places, buildings, roads, land use, water, infrastructure, divisions 1.08M features
OpenStreetMap (34 layers) MTR stations, bus stops, restaurants, schools, hospitals, parks, etc. 46,728 POIs
Census 2021 (C&SD) Population, age, sex, income, occupation, housing, commute by STPU 274 × 208 columns
CSDI Planning Dept 292 TPU boundary polygons 292 polygons
NASA VIIRS Black Marble Monthly nighttime radiance composites 37 months
GHSL (EU JRC) Built-up surface, population, building height, urbanization class 2010/2020/2025 epochs
Copernicus GLO-30 Digital elevation model at 30m 2 tiles
ESA WorldCover 2021 10m land cover (11 classes) 2 tiles
WorldPop 2020 100m population grid (corrected ×0.593) Territory-wide
Hansen/UMD Global forest change (treecover, loss, gain) 30m
Sentinel-2 Cloud-free optical composite (NDVI) 6 scenes medianed
Meta HRSL 30m demographic strata (7 age/gender layers) Territory-wide
MTR opendata Station routes, fares, LRT 5 CSVs
KMB ETA API 6,715 bus stops + routes 3 JSONs
FEHD (data.gov.hk) 17,188 licensed restaurant premises F&B ground truth
EDB (data.gov.hk) 3,486 schools with lat/lng Education anchor
data.gov.hk bulk Census, property, buildings, schools, health, transit 118+ files
Serper.dev SERP 161 HK mall tenant directory URLs 161 malls
OSM HK PBF Full territory OSM extract 41 MB
Overture Divisions SAR, districts, localities, microhoods 441 features

Region representation pipeline#

Phase 1: Boundaries + grid - 292 TPU polygons from CSDI - H3 resolution-9 hex grid clipped to land (10,928 cells, 1,124 km²) - H3 resolution-8 parent aggregation (1,896 cells) - Hex → TPU spatial assignment

Phase 2: Structural features - Overture buildings (302K) → per-hex height/area/count - Overture roads (452K segments) → per-hex length/density by class - Overture land use (40K) → per-hex zoning composition (% residential/commercial/park) - WorldCover → per-hex land cover binary flags (tree/built/water) - DEM → elevation, hillside flag, terrain ruggedness - Coastline → coast proximity, waterfront flag

Phase 3: Satellite intelligence - VIIRS nightlights → per-hex radiance mean, per-capita, commercial indicator, growth corridors, decline zones, YoY change - GHSL → built-up surface 2010/2020/2025 (age proxy), population, building height, urbanization class - Hansen → treecover, forest loss/gain - Sentinel-2 → NDVI vegetation index

Phase 4: Place composition - 147K places spatially joined to hex-9 and TPU - Per-region: L1 counts (11 categories), L2 counts (30 categories), top-20 fine categories - Category entropy, cuisine diversity (22 types), chain ratio, brand penetration

Phase 5: Census demographics - 206 features from Census 2021 STPU (166/292 TPUs matched) - Age brackets, sex ratio, income by occupation, dwelling type, commute patterns, ethnicity, language, literacy

Phase 6: Verticality - Estimated floors (GHSL height / 3.5m) - FAR proxy (GFA / hex area) - Stacking intensity (height / built fraction) - Podium-tower signal, skyline prominence - Industrial conversion detection

Phase 7: Proxy features - Daytime population (office×15 + retail×8 + F&B×5 + hotel×20 + school×25) - Tourism intensity (hotels + attractions + landmarks) - Income proxy (building height + nightlight + services) - Footfall proxy (transit + places + nightlight + pop) - Night economy (bars + hospitality + nightlight) - Pedestrian connectivity, mixed-use index, green ratio

Phase 8: Graph features (TPU only) - Queen contiguity adjacency (avg 4.9 neighbors) - Spatial lag + contrast for 12 core features - Ring-1 neighbor sums

Phase 9: Composite scores + indices - Vitality, accessibility, demand, competition, growth potential, saturation - Skyline, redevelopment pressure, verticality, compactness, mixed-height

Phase 10: Archetype clustering - K-means (k=5) on 52 normalized features - Tourist/Entertainment (12 TPUs), Dense Residential (30), Suburban Residential (31), New Development (80), Country/Green Belt (91)

Phase 11: Hex-8 neighborhood layer - Aggregated hex-9 features upward - Broadcast TPU features downward (265 features) - Neighbor influence: boundary/gateway/cluster center flags, interface score, gradient position, net demand flow

Place representation pipeline#

Step 1: Source extraction - Overture Maps 141K places + OSM 47K POIs

Step 2: Conflation + dedup - Spatial join (50m radius) + name similarity - 20K net-new from OSM after deduplication

Step 3: Category taxonomy - L1: 11 groups (food, retail, services, health, education, leisure, hospitality, transit, community, auto, other) - L2: 30 groups (cafe_coffee, grocery, general_dining, bar_nightlife, etc.)

Step 4: Chain detection - 151 known HK chain brands matched by name - 10,474 outlets detected (McDonald's 880, Starbucks 729, 7-Eleven 428, etc.)

Step 5: Spatial assignment - H3 res-9 + res-8, TPU + district via point-in-polygon

Step 6: Competition - cKDTree spatial self-join per L2 category - Same-category count within 200m and 500m - Nearest competitor distance

Step 7: Complementary - Cross-category count within 300m using per-L1 KD-trees - Complementary diversity (unique L1 categories within 300m)

Step 8: Anchor proximity - 16 anchor types from OSM (MTR, schools, hospitals, hotels, malls, supermarkets, parks, bus stops, etc.) - Per-anchor: count within radius + nearest distance + boolean flag - Composite anchor_score (weighted across all types)

Step 9: Demand pull - 6 demand sources: office (nightlight-based), residential (WorldPop 500m), transit (MTR decay), hotel, school, mall - Distance-decay weighted (exponential, halflife 200m) - Composite demand_context_score

Step 10: Co-location synergy - 10 category-pair synergies, each fires ONLY for the target category - cafe×office, grocery×residential, convenience×transit, restaurant×hotel, gym×cafe, pharmacy×clinic, bar×restaurant, school×tutoring, bank×office, bakery×cafe

Step 11: Building context - 12 features from host hex-9: height, floors, FAR, podium, stacking, GHSL height, built fraction, land use, terrain, waterfront, hillside

Step 12: Neighborhood character - 14 features broadcast from parent TPU: archetype, vitality, demand, competition, accessibility, population density, daytime ratio, income proxy, cuisine diversity, nightlight trend, new development


Context of usage#

Site selection — "Where should brand X open next?"#

Query the place table for gaps: high demand_context + low competitors_200m + right archetype. The feature vector tells you not just WHERE but WHY — which demand source (office workers? residents? tourists?) drives the opportunity.

Portfolio analytics — "How diversified is this REIT's portfolio?"#

Map each property to a TPU archetype. Measure concentration risk across the 5 archetypes. Flag properties in decline zones (nl_growth_pct < -20%) or high-redevelopment areas (idx_redevelopment_pressure > 0.7).

Competitive landscape — "Who are my competitors and what's around them?"#

For any place: competitors_200m gives direct competition count, nearest_competitor_m gives breathing room, complementary_diversity shows ecosystem richness. Compare against archetype averages to know if competition is above or below normal for that neighborhood type.

Catchment analysis — "Who is my customer?"#

pull_residential tells you how many residents are nearby. pull_office tells you if office workers drive demand. char_daytime_ratio reveals if the area is office-dominant (>1) or residential-dominant (<1). Census features give age/income/ethnicity breakdown.

Growth corridor detection — "Where is the city growing?"#

nl_growth_corridor flags hexes brightening >20%. proxy_new_development shows GHSL built-surface change. score_growth_potential combines both with low-competition signal. Track these quarterly via VIIRS monthly updates.

Neighborhood scoring — "Is this a walkable 15-minute neighborhood?"#

Hex-8 level: r8_walkability combines transit_score + connectivity + places + anchors. Check osm_*_dist_m for specific amenity distances. r8_residential_quality combines green + parks + schools + low density.

Micrograph construction — "Star diagram for any place"#

Every place has the 4-arm context: T1 (transit_score), competitors (competitors_200m), demand magnets (complementary_*), anchor quality (anchor_score). Plus building context for vertical dimension.

Urban planning — "Where is redevelopment pressure highest?"#

idx_redevelopment_pressure = old buildings (bld_age_index) + low-rise (bld_lowrise_ratio) + high vitality (score_vitality). Cross with census_oq_* (housing tenure) to identify public housing renewal candidates.


Validation summary#

Test Result
Sanity checks (41 tests) 41/41 pass (100%)
Deep validation (30 tests) 27/30 pass (90%)
Place logic checks (13 tests) 13/13 pass (100%)
Demand pull tests (6 tests) 6/6 pass (100%)
Census-WorldPop correlation r=0.96
Population gap vs official 1.9%
Place deduplication 0 remaining duplicates
GHSL temporal monotonicity 0 violations (2010 ≤ 2020 ≤ 2025)
Archetype spot checks All 5 known locations correct
F&B vs FEHD ground truth 2.46x ratio (expected 2-3x)

Files on atlas-1#

/home/azureuser/digital-atlas-hkg/

data/outputs/
  tpu_features_final.parquet              292 × 538       1.0 MB
  h3_res8_features.parquet              1,896 × 486       2.3 MB

data/hex_v9/
  hkg_hex_v9_features_v4.parquet       10,928 × 205       6.6 MB

data/places_consolidated/
  hkg_places_final.parquet            147,191 × 142      32.4 MB

data/boundaries/
  hkg_tpu.geojson                     292 TPU polygons    38 MB
  hkg_districts.geojson               19 districts        470 KB
  hkg_sar.geojson                     territory           84 KB
  hkg_hex_v9_land.geojson             10,928 hex-9 polys  6.2 MB

data/serving/                         API-ready JSON      829 MB
  tpu.json, hex8.json, hex9.json, places.json, places_slim.json
  tpu_geo.geojson, hex8_geo.geojson, hex9_geo.geojson
  feature_catalog.json, archetypes.json, places_schema.json
  places_methodology.json, places_examples.json, places_stats.json
  place_representation.json, manifest.json, models/

model/
  v7_gap_model.pkl                    R²=0.923            3.4 MB
  v8_population_model.pkl             R²=0.819            7.8 MB

docs/
  HKG_DATASET_OVERVIEW.html           single-page summary
  HKG_ATLAS_COMBINED.html             7-tab full report
  PLACES_LAYER.md                     places documentation
  + 7 topic-specific HTML reports