# Digital Atlas Hong Kong — Method, Scale & Context of Usage ## What this is A mathematical representation of Hong Kong's urban commercial structure. Every planning unit, every neighborhood, and every commercial place in the territory described by a structured feature vector — capturing what it is, what surrounds it, who needs it, and how it's changing. Built on atlas-1 at `/home/azureuser/digital-atlas-hkg/`. Total footprint: 6.2 GB. --- ## Scale ### Territory coverage - **Area:** 1,112 km² (292 Tertiary Planning Units) - **Population:** 7.47 million (WorldPop corrected + Census 2021) - **Commercial places:** 147,191 (Overture Maps + OpenStreetMap, deduplicated) - **Chain brands:** 151 detected, 10,474 outlets - **Data sources:** 20 independent layers - **Temporal depth:** VIIRS nightlights 2023–2025 (37 months), GHSL built-up 2010/2020/2025 ### Spatial resolution | Level | Units | Features | Grain | Role | |---|---:|---:|---|---| | **TPU** | 292 | 538 | Planning unit (variable size) | Primary analytical unit — census-aligned, archetype-classified | | **Hex-8** | 1,896 | 486 | ~460m edge (~0.74 km²) | Neighborhood — boundary/gateway/cluster detection | | **Hex-9** | 10,928 | 205 | ~175m edge (~0.105 km²) | Street-level — satellite, terrain, land use, building structure | | **Places** | 147,191 | 142 | Individual business | Micro-context — competition, demand pull, synergy, anchors | ### Feature count: 1,371 total across all levels --- ## Method ### Data acquisition (20 sources) | Source | What | Records | |---|---|---| | Overture Maps 2026-04 | Places, buildings, roads, land use, water, infrastructure, divisions | 1.08M features | | OpenStreetMap (34 layers) | MTR stations, bus stops, restaurants, schools, hospitals, parks, etc. | 46,728 POIs | | Census 2021 (C&SD) | Population, age, sex, income, occupation, housing, commute by STPU | 274 × 208 columns | | CSDI Planning Dept | 292 TPU boundary polygons | 292 polygons | | NASA VIIRS Black Marble | Monthly nighttime radiance composites | 37 months | | GHSL (EU JRC) | Built-up surface, population, building height, urbanization class | 2010/2020/2025 epochs | | Copernicus GLO-30 | Digital elevation model at 30m | 2 tiles | | ESA WorldCover 2021 | 10m land cover (11 classes) | 2 tiles | | WorldPop 2020 | 100m population grid (corrected ×0.593) | Territory-wide | | Hansen/UMD | Global forest change (treecover, loss, gain) | 30m | | Sentinel-2 | Cloud-free optical composite (NDVI) | 6 scenes medianed | | Meta HRSL | 30m demographic strata (7 age/gender layers) | Territory-wide | | MTR opendata | Station routes, fares, LRT | 5 CSVs | | KMB ETA API | 6,715 bus stops + routes | 3 JSONs | | FEHD (data.gov.hk) | 17,188 licensed restaurant premises | F&B ground truth | | EDB (data.gov.hk) | 3,486 schools with lat/lng | Education anchor | | data.gov.hk bulk | Census, property, buildings, schools, health, transit | 118+ files | | Serper.dev SERP | 161 HK mall tenant directory URLs | 161 malls | | OSM HK PBF | Full territory OSM extract | 41 MB | | Overture Divisions | SAR, districts, localities, microhoods | 441 features | ### Region representation pipeline **Phase 1: Boundaries + grid** - 292 TPU polygons from CSDI - H3 resolution-9 hex grid clipped to land (10,928 cells, 1,124 km²) - H3 resolution-8 parent aggregation (1,896 cells) - Hex → TPU spatial assignment **Phase 2: Structural features** - Overture buildings (302K) → per-hex height/area/count - Overture roads (452K segments) → per-hex length/density by class - Overture land use (40K) → per-hex zoning composition (% residential/commercial/park) - WorldCover → per-hex land cover binary flags (tree/built/water) - DEM → elevation, hillside flag, terrain ruggedness - Coastline → coast proximity, waterfront flag **Phase 3: Satellite intelligence** - VIIRS nightlights → per-hex radiance mean, per-capita, commercial indicator, growth corridors, decline zones, YoY change - GHSL → built-up surface 2010/2020/2025 (age proxy), population, building height, urbanization class - Hansen → treecover, forest loss/gain - Sentinel-2 → NDVI vegetation index **Phase 4: Place composition** - 147K places spatially joined to hex-9 and TPU - Per-region: L1 counts (11 categories), L2 counts (30 categories), top-20 fine categories - Category entropy, cuisine diversity (22 types), chain ratio, brand penetration **Phase 5: Census demographics** - 206 features from Census 2021 STPU (166/292 TPUs matched) - Age brackets, sex ratio, income by occupation, dwelling type, commute patterns, ethnicity, language, literacy **Phase 6: Verticality** - Estimated floors (GHSL height / 3.5m) - FAR proxy (GFA / hex area) - Stacking intensity (height / built fraction) - Podium-tower signal, skyline prominence - Industrial conversion detection **Phase 7: Proxy features** - Daytime population (office×15 + retail×8 + F&B×5 + hotel×20 + school×25) - Tourism intensity (hotels + attractions + landmarks) - Income proxy (building height + nightlight + services) - Footfall proxy (transit + places + nightlight + pop) - Night economy (bars + hospitality + nightlight) - Pedestrian connectivity, mixed-use index, green ratio **Phase 8: Graph features (TPU only)** - Queen contiguity adjacency (avg 4.9 neighbors) - Spatial lag + contrast for 12 core features - Ring-1 neighbor sums **Phase 9: Composite scores + indices** - Vitality, accessibility, demand, competition, growth potential, saturation - Skyline, redevelopment pressure, verticality, compactness, mixed-height **Phase 10: Archetype clustering** - K-means (k=5) on 52 normalized features - Tourist/Entertainment (12 TPUs), Dense Residential (30), Suburban Residential (31), New Development (80), Country/Green Belt (91) **Phase 11: Hex-8 neighborhood layer** - Aggregated hex-9 features upward - Broadcast TPU features downward (265 features) - Neighbor influence: boundary/gateway/cluster center flags, interface score, gradient position, net demand flow ### Place representation pipeline **Step 1: Source extraction** - Overture Maps 141K places + OSM 47K POIs **Step 2: Conflation + dedup** - Spatial join (50m radius) + name similarity - 20K net-new from OSM after deduplication **Step 3: Category taxonomy** - L1: 11 groups (food, retail, services, health, education, leisure, hospitality, transit, community, auto, other) - L2: 30 groups (cafe_coffee, grocery, general_dining, bar_nightlife, etc.) **Step 4: Chain detection** - 151 known HK chain brands matched by name - 10,474 outlets detected (McDonald's 880, Starbucks 729, 7-Eleven 428, etc.) **Step 5: Spatial assignment** - H3 res-9 + res-8, TPU + district via point-in-polygon **Step 6: Competition** - cKDTree spatial self-join per L2 category - Same-category count within 200m and 500m - Nearest competitor distance **Step 7: Complementary** - Cross-category count within 300m using per-L1 KD-trees - Complementary diversity (unique L1 categories within 300m) **Step 8: Anchor proximity** - 16 anchor types from OSM (MTR, schools, hospitals, hotels, malls, supermarkets, parks, bus stops, etc.) - Per-anchor: count within radius + nearest distance + boolean flag - Composite anchor_score (weighted across all types) **Step 9: Demand pull** - 6 demand sources: office (nightlight-based), residential (WorldPop 500m), transit (MTR decay), hotel, school, mall - Distance-decay weighted (exponential, halflife 200m) - Composite demand_context_score **Step 10: Co-location synergy** - 10 category-pair synergies, each fires ONLY for the target category - cafe×office, grocery×residential, convenience×transit, restaurant×hotel, gym×cafe, pharmacy×clinic, bar×restaurant, school×tutoring, bank×office, bakery×cafe **Step 11: Building context** - 12 features from host hex-9: height, floors, FAR, podium, stacking, GHSL height, built fraction, land use, terrain, waterfront, hillside **Step 12: Neighborhood character** - 14 features broadcast from parent TPU: archetype, vitality, demand, competition, accessibility, population density, daytime ratio, income proxy, cuisine diversity, nightlight trend, new development --- ## Context of usage ### Site selection — "Where should brand X open next?" Query the place table for gaps: high demand_context + low competitors_200m + right archetype. The feature vector tells you not just WHERE but WHY — which demand source (office workers? residents? tourists?) drives the opportunity. ### Portfolio analytics — "How diversified is this REIT's portfolio?" Map each property to a TPU archetype. Measure concentration risk across the 5 archetypes. Flag properties in decline zones (nl_growth_pct < -20%) or high-redevelopment areas (idx_redevelopment_pressure > 0.7). ### Competitive landscape — "Who are my competitors and what's around them?" For any place: competitors_200m gives direct competition count, nearest_competitor_m gives breathing room, complementary_diversity shows ecosystem richness. Compare against archetype averages to know if competition is above or below normal for that neighborhood type. ### Catchment analysis — "Who is my customer?" pull_residential tells you how many residents are nearby. pull_office tells you if office workers drive demand. char_daytime_ratio reveals if the area is office-dominant (>1) or residential-dominant (<1). Census features give age/income/ethnicity breakdown. ### Growth corridor detection — "Where is the city growing?" nl_growth_corridor flags hexes brightening >20%. proxy_new_development shows GHSL built-surface change. score_growth_potential combines both with low-competition signal. Track these quarterly via VIIRS monthly updates. ### Neighborhood scoring — "Is this a walkable 15-minute neighborhood?" Hex-8 level: r8_walkability combines transit_score + connectivity + places + anchors. Check osm_*_dist_m for specific amenity distances. r8_residential_quality combines green + parks + schools + low density. ### Micrograph construction — "Star diagram for any place" Every place has the 4-arm context: T1 (transit_score), competitors (competitors_200m), demand magnets (complementary_*), anchor quality (anchor_score). Plus building context for vertical dimension. ### Urban planning — "Where is redevelopment pressure highest?" idx_redevelopment_pressure = old buildings (bld_age_index) + low-rise (bld_lowrise_ratio) + high vitality (score_vitality). Cross with census_oq_* (housing tenure) to identify public housing renewal candidates. --- ## Validation summary | Test | Result | |---|---| | Sanity checks (41 tests) | 41/41 pass (100%) | | Deep validation (30 tests) | 27/30 pass (90%) | | Place logic checks (13 tests) | 13/13 pass (100%) | | Demand pull tests (6 tests) | 6/6 pass (100%) | | Census-WorldPop correlation | r=0.96 | | Population gap vs official | 1.9% | | Place deduplication | 0 remaining duplicates | | GHSL temporal monotonicity | 0 violations (2010 ≤ 2020 ≤ 2025) | | Archetype spot checks | All 5 known locations correct | | F&B vs FEHD ground truth | 2.46x ratio (expected 2-3x) | --- ## Files on atlas-1 ``` /home/azureuser/digital-atlas-hkg/ data/outputs/ tpu_features_final.parquet 292 × 538 1.0 MB h3_res8_features.parquet 1,896 × 486 2.3 MB data/hex_v9/ hkg_hex_v9_features_v4.parquet 10,928 × 205 6.6 MB data/places_consolidated/ hkg_places_final.parquet 147,191 × 142 32.4 MB data/boundaries/ hkg_tpu.geojson 292 TPU polygons 38 MB hkg_districts.geojson 19 districts 470 KB hkg_sar.geojson territory 84 KB hkg_hex_v9_land.geojson 10,928 hex-9 polys 6.2 MB data/serving/ API-ready JSON 829 MB tpu.json, hex8.json, hex9.json, places.json, places_slim.json tpu_geo.geojson, hex8_geo.geojson, hex9_geo.geojson feature_catalog.json, archetypes.json, places_schema.json places_methodology.json, places_examples.json, places_stats.json place_representation.json, manifest.json, models/ model/ v7_gap_model.pkl R²=0.923 3.4 MB v8_population_model.pkl R²=0.819 7.8 MB docs/ HKG_DATASET_OVERVIEW.html single-page summary HKG_ATLAS_COMBINED.html 7-tab full report PLACES_LAYER.md places documentation + 7 topic-specific HTML reports ```