# Digital Atlas Hong Kong — Method, Scale & Context of Usage

## What this is

A mathematical representation of Hong Kong's urban commercial structure. Every planning unit, every neighborhood, and every commercial place in the territory described by a structured feature vector — capturing what it is, what surrounds it, who needs it, and how it's changing.

Built on atlas-1 at `/home/azureuser/digital-atlas-hkg/`. Total footprint: 6.2 GB.

---

## Scale

### Territory coverage
- **Area:** 1,112 km² (292 Tertiary Planning Units)
- **Population:** 7.47 million (WorldPop corrected + Census 2021)
- **Commercial places:** 147,191 (Overture Maps + OpenStreetMap, deduplicated)
- **Chain brands:** 151 detected, 10,474 outlets
- **Data sources:** 20 independent layers
- **Temporal depth:** VIIRS nightlights 2023–2025 (37 months), GHSL built-up 2010/2020/2025

### Spatial resolution

| Level | Units | Features | Grain | Role |
|---|---:|---:|---|---|
| **TPU** | 292 | 538 | Planning unit (variable size) | Primary analytical unit — census-aligned, archetype-classified |
| **Hex-8** | 1,896 | 486 | ~460m edge (~0.74 km²) | Neighborhood — boundary/gateway/cluster detection |
| **Hex-9** | 10,928 | 205 | ~175m edge (~0.105 km²) | Street-level — satellite, terrain, land use, building structure |
| **Places** | 147,191 | 142 | Individual business | Micro-context — competition, demand pull, synergy, anchors |

### Feature count: 1,371 total across all levels

---

## Method

### Data acquisition (20 sources)

| Source | What | Records |
|---|---|---|
| Overture Maps 2026-04 | Places, buildings, roads, land use, water, infrastructure, divisions | 1.08M features |
| OpenStreetMap (34 layers) | MTR stations, bus stops, restaurants, schools, hospitals, parks, etc. | 46,728 POIs |
| Census 2021 (C&SD) | Population, age, sex, income, occupation, housing, commute by STPU | 274 × 208 columns |
| CSDI Planning Dept | 292 TPU boundary polygons | 292 polygons |
| NASA VIIRS Black Marble | Monthly nighttime radiance composites | 37 months |
| GHSL (EU JRC) | Built-up surface, population, building height, urbanization class | 2010/2020/2025 epochs |
| Copernicus GLO-30 | Digital elevation model at 30m | 2 tiles |
| ESA WorldCover 2021 | 10m land cover (11 classes) | 2 tiles |
| WorldPop 2020 | 100m population grid (corrected ×0.593) | Territory-wide |
| Hansen/UMD | Global forest change (treecover, loss, gain) | 30m |
| Sentinel-2 | Cloud-free optical composite (NDVI) | 6 scenes medianed |
| Meta HRSL | 30m demographic strata (7 age/gender layers) | Territory-wide |
| MTR opendata | Station routes, fares, LRT | 5 CSVs |
| KMB ETA API | 6,715 bus stops + routes | 3 JSONs |
| FEHD (data.gov.hk) | 17,188 licensed restaurant premises | F&B ground truth |
| EDB (data.gov.hk) | 3,486 schools with lat/lng | Education anchor |
| data.gov.hk bulk | Census, property, buildings, schools, health, transit | 118+ files |
| Serper.dev SERP | 161 HK mall tenant directory URLs | 161 malls |
| OSM HK PBF | Full territory OSM extract | 41 MB |
| Overture Divisions | SAR, districts, localities, microhoods | 441 features |

### Region representation pipeline

**Phase 1: Boundaries + grid**
- 292 TPU polygons from CSDI
- H3 resolution-9 hex grid clipped to land (10,928 cells, 1,124 km²)
- H3 resolution-8 parent aggregation (1,896 cells)
- Hex → TPU spatial assignment

**Phase 2: Structural features**
- Overture buildings (302K) → per-hex height/area/count
- Overture roads (452K segments) → per-hex length/density by class
- Overture land use (40K) → per-hex zoning composition (% residential/commercial/park)
- WorldCover → per-hex land cover binary flags (tree/built/water)
- DEM → elevation, hillside flag, terrain ruggedness
- Coastline → coast proximity, waterfront flag

**Phase 3: Satellite intelligence**
- VIIRS nightlights → per-hex radiance mean, per-capita, commercial indicator, growth corridors, decline zones, YoY change
- GHSL → built-up surface 2010/2020/2025 (age proxy), population, building height, urbanization class
- Hansen → treecover, forest loss/gain
- Sentinel-2 → NDVI vegetation index

**Phase 4: Place composition**
- 147K places spatially joined to hex-9 and TPU
- Per-region: L1 counts (11 categories), L2 counts (30 categories), top-20 fine categories
- Category entropy, cuisine diversity (22 types), chain ratio, brand penetration

**Phase 5: Census demographics**
- 206 features from Census 2021 STPU (166/292 TPUs matched)
- Age brackets, sex ratio, income by occupation, dwelling type, commute patterns, ethnicity, language, literacy

**Phase 6: Verticality**
- Estimated floors (GHSL height / 3.5m)
- FAR proxy (GFA / hex area)
- Stacking intensity (height / built fraction)
- Podium-tower signal, skyline prominence
- Industrial conversion detection

**Phase 7: Proxy features**
- Daytime population (office×15 + retail×8 + F&B×5 + hotel×20 + school×25)
- Tourism intensity (hotels + attractions + landmarks)
- Income proxy (building height + nightlight + services)
- Footfall proxy (transit + places + nightlight + pop)
- Night economy (bars + hospitality + nightlight)
- Pedestrian connectivity, mixed-use index, green ratio

**Phase 8: Graph features (TPU only)**
- Queen contiguity adjacency (avg 4.9 neighbors)
- Spatial lag + contrast for 12 core features
- Ring-1 neighbor sums

**Phase 9: Composite scores + indices**
- Vitality, accessibility, demand, competition, growth potential, saturation
- Skyline, redevelopment pressure, verticality, compactness, mixed-height

**Phase 10: Archetype clustering**
- K-means (k=5) on 52 normalized features
- Tourist/Entertainment (12 TPUs), Dense Residential (30), Suburban Residential (31), New Development (80), Country/Green Belt (91)

**Phase 11: Hex-8 neighborhood layer**
- Aggregated hex-9 features upward
- Broadcast TPU features downward (265 features)
- Neighbor influence: boundary/gateway/cluster center flags, interface score, gradient position, net demand flow

### Place representation pipeline

**Step 1: Source extraction**
- Overture Maps 141K places + OSM 47K POIs

**Step 2: Conflation + dedup**
- Spatial join (50m radius) + name similarity
- 20K net-new from OSM after deduplication

**Step 3: Category taxonomy**
- L1: 11 groups (food, retail, services, health, education, leisure, hospitality, transit, community, auto, other)
- L2: 30 groups (cafe_coffee, grocery, general_dining, bar_nightlife, etc.)

**Step 4: Chain detection**
- 151 known HK chain brands matched by name
- 10,474 outlets detected (McDonald's 880, Starbucks 729, 7-Eleven 428, etc.)

**Step 5: Spatial assignment**
- H3 res-9 + res-8, TPU + district via point-in-polygon

**Step 6: Competition**
- cKDTree spatial self-join per L2 category
- Same-category count within 200m and 500m
- Nearest competitor distance

**Step 7: Complementary**
- Cross-category count within 300m using per-L1 KD-trees
- Complementary diversity (unique L1 categories within 300m)

**Step 8: Anchor proximity**
- 16 anchor types from OSM (MTR, schools, hospitals, hotels, malls, supermarkets, parks, bus stops, etc.)
- Per-anchor: count within radius + nearest distance + boolean flag
- Composite anchor_score (weighted across all types)

**Step 9: Demand pull**
- 6 demand sources: office (nightlight-based), residential (WorldPop 500m), transit (MTR decay), hotel, school, mall
- Distance-decay weighted (exponential, halflife 200m)
- Composite demand_context_score

**Step 10: Co-location synergy**
- 10 category-pair synergies, each fires ONLY for the target category
- cafe×office, grocery×residential, convenience×transit, restaurant×hotel, gym×cafe, pharmacy×clinic, bar×restaurant, school×tutoring, bank×office, bakery×cafe

**Step 11: Building context**
- 12 features from host hex-9: height, floors, FAR, podium, stacking, GHSL height, built fraction, land use, terrain, waterfront, hillside

**Step 12: Neighborhood character**
- 14 features broadcast from parent TPU: archetype, vitality, demand, competition, accessibility, population density, daytime ratio, income proxy, cuisine diversity, nightlight trend, new development

---

## Context of usage

### Site selection — "Where should brand X open next?"
Query the place table for gaps: high demand_context + low competitors_200m + right archetype. The feature vector tells you not just WHERE but WHY — which demand source (office workers? residents? tourists?) drives the opportunity.

### Portfolio analytics — "How diversified is this REIT's portfolio?"
Map each property to a TPU archetype. Measure concentration risk across the 5 archetypes. Flag properties in decline zones (nl_growth_pct < -20%) or high-redevelopment areas (idx_redevelopment_pressure > 0.7).

### Competitive landscape — "Who are my competitors and what's around them?"
For any place: competitors_200m gives direct competition count, nearest_competitor_m gives breathing room, complementary_diversity shows ecosystem richness. Compare against archetype averages to know if competition is above or below normal for that neighborhood type.

### Catchment analysis — "Who is my customer?"
pull_residential tells you how many residents are nearby. pull_office tells you if office workers drive demand. char_daytime_ratio reveals if the area is office-dominant (>1) or residential-dominant (<1). Census features give age/income/ethnicity breakdown.

### Growth corridor detection — "Where is the city growing?"
nl_growth_corridor flags hexes brightening >20%. proxy_new_development shows GHSL built-surface change. score_growth_potential combines both with low-competition signal. Track these quarterly via VIIRS monthly updates.

### Neighborhood scoring — "Is this a walkable 15-minute neighborhood?"
Hex-8 level: r8_walkability combines transit_score + connectivity + places + anchors. Check osm_*_dist_m for specific amenity distances. r8_residential_quality combines green + parks + schools + low density.

### Micrograph construction — "Star diagram for any place"
Every place has the 4-arm context: T1 (transit_score), competitors (competitors_200m), demand magnets (complementary_*), anchor quality (anchor_score). Plus building context for vertical dimension.

### Urban planning — "Where is redevelopment pressure highest?"
idx_redevelopment_pressure = old buildings (bld_age_index) + low-rise (bld_lowrise_ratio) + high vitality (score_vitality). Cross with census_oq_* (housing tenure) to identify public housing renewal candidates.

---

## Validation summary

| Test | Result |
|---|---|
| Sanity checks (41 tests) | 41/41 pass (100%) |
| Deep validation (30 tests) | 27/30 pass (90%) |
| Place logic checks (13 tests) | 13/13 pass (100%) |
| Demand pull tests (6 tests) | 6/6 pass (100%) |
| Census-WorldPop correlation | r=0.96 |
| Population gap vs official | 1.9% |
| Place deduplication | 0 remaining duplicates |
| GHSL temporal monotonicity | 0 violations (2010 ≤ 2020 ≤ 2025) |
| Archetype spot checks | All 5 known locations correct |
| F&B vs FEHD ground truth | 2.46x ratio (expected 2-3x) |

---

## Files on atlas-1

```
/home/azureuser/digital-atlas-hkg/

data/outputs/
  tpu_features_final.parquet              292 × 538       1.0 MB
  h3_res8_features.parquet              1,896 × 486       2.3 MB

data/hex_v9/
  hkg_hex_v9_features_v4.parquet       10,928 × 205       6.6 MB

data/places_consolidated/
  hkg_places_final.parquet            147,191 × 142      32.4 MB

data/boundaries/
  hkg_tpu.geojson                     292 TPU polygons    38 MB
  hkg_districts.geojson               19 districts        470 KB
  hkg_sar.geojson                     territory           84 KB
  hkg_hex_v9_land.geojson             10,928 hex-9 polys  6.2 MB

data/serving/                         API-ready JSON      829 MB
  tpu.json, hex8.json, hex9.json, places.json, places_slim.json
  tpu_geo.geojson, hex8_geo.geojson, hex9_geo.geojson
  feature_catalog.json, archetypes.json, places_schema.json
  places_methodology.json, places_examples.json, places_stats.json
  place_representation.json, manifest.json, models/

model/
  v7_gap_model.pkl                    R²=0.923            3.4 MB
  v8_population_model.pkl             R²=0.819            7.8 MB

docs/
  HKG_DATASET_OVERVIEW.html           single-page summary
  HKG_ATLAS_COMBINED.html             7-tab full report
  PLACES_LAYER.md                     places documentation
  + 7 topic-specific HTML reports
```
