SGP Digital Atlas

Features + Plexis Graph + Embeddings — Final Report
Propheus · April 2026 · 20 data sources · 5.98M population · 174,711 places
Features & Data
Atlas & Graph
Use Cases
Accuracy
20
Data Sources
Gov + commercial + satellite + live API
5.98M
Population
4.21M resident + 1.77M non-resident
637
Hex-8 Features
1,191 neighborhoods
612
Hex-9 Features
7,318 cells
114
Place Features
174,711 businesses
~449
Subzone Features
326 URA zones

Data Sources

SourceRecordsFeatures derived
Overture Maps + OSM174,711 places79 place composition + 114 per-place
Overture Buildings377,33116 built environment
LTA Stations + Bus Stops231 MRT + 44 LRT + 5,177 bus18 transit + 8 GTFS + 19 place anchors
LTA Ridership (hourly)12.3M taps/day7 temporal transit (AM/PM/off/night)
Singapore GTFS 2026230,914 trips8 frequency features (headway by window)
OSM Road Network550,991 segments26 walkability (network walk from 214K-node graph)
SingStat Population5,982,32018 demographics + 12 dwelling type
HDB Resale Transactions227,2072 property price features
URA Master Plan113,212 parcels12 land use
NASA VIIRS Nightlights2 epochs (2022+2024)7 nightlight features (growth, commercial indicator)
WorldPop + WorldCoverGrid rasters5 satellite (pop growth, land cover)
LTA DataMall API (live)3K taxis + 2.6K carparks + 50K speed bands10 dynamic (taxi, carpark, congestion, bus services)
LTA Bus Routes26,711 route-stops (789 services)Bus network topology + connectivity
Government amenity datasets~3,000 (hawkers, clinics, parks, schools, hotels)16 amenity counts + distance features
OSM POIs (4 layers)52,3174 supplementary counts

Feature Pillars — Hex-8 (637 features)

Pillar#SourceWhat it captures
Demographics18SingStatPopulation total (5.98M), elderly %, non-resident, daytime intensity
Dwelling types12SingStatHDB 1-5rm, condo, landed, by floor area
Built environment16OvertureBuildings, HDB blocks, floors, commercial/industrial
Land use12URAZoning %, entropy, fragmentation, dominant use
Transit18LTAMRT/bus counts, taps (total + AM/PM/off/night)
GTFS frequency8GTFS 2026Headway by time window, routes, departures
Walkability26OSM graphEuclidean + network walk to 6 amenities, detour ratios
Amenities16Gov dataHawkers, clinics, parks, supermarkets, hotels, schools
Place composition79Overture+OSM24 categories, tiers, entropy, HHI, brands
Demand pull12Computed6 pull scores (office/residential/transit/hotel/school/hawker)
Synergy20Computed10 co-location scores (cafe×office, grocery×residential...)
Saturation13Computed5 categories: supply/demand ratio + gap (pop_total denominator)
Satellite12VIIRS+WP+WCNightlight change, pop growth, land cover
Archetypes15K-means6 types + 8 indices (vitality, accessibility, demand...) + 4 proxies
Micrograph156Pipeline12 categories × 13 context vectors
Spatial context123H3 ringsRing-1/2 max + pop-weighted aggregates
LTA dynamic10Live APITaxi, carpark, speed, congestion, car dependency
Structure8Cross-scaleInterface, gradient, demand flow, ecosystem, self-containment
Property2HDB resaleMedian PSF, transaction count

Place Features (114 per place)

Group#Key features
Identity14Category (24 types), price tier, branded, h3 keys
Competition5competitors_200m/500m, substitution_risk, market_share
Complementary5Cross-category diversity 300m, score
Anchors1914 types (MRT, bus, hawker, clinic, park, hotel, school, library, sports...)
Demand + synergy186 pulls + demand_context + 10 synergies (target-category-only)
Transit8Network walk MRT/bus, GTFS headway, transit_score
Catchment5pop, elderly, nonresident, daytime intensity
Context16+Building, neighborhood, archetype, indices, nightlight
Supply-demand5saturation, demand_match, survivability_index
~200K
Graph Nodes
Feature-rich (114-637d)
1.49M
Graph Edges
39 relation types
128d
Embeddings
R-GCN two-head
Plexis
πλεξις
"weaving"

Four spatial levels + knowledge graph

Flat features answer WHAT (this hex has 42K pop, 0.2x cafe saturation). The graph answers WHY (because MRT funnels 125K commuters, adjacent hexes leak demand here, void-deck economy captures captive residents) and WHAT IF (remove the MRT → which places shift most?).
LevelUnitsFeaturesResolutionRole
Hex-97,318612~174mFine-grain context, walkability, micrograph
Hex-81,191637~461mPrimary: demand, gaps, archetypes, ecosystem
Subzone326~449URAPolicy alignment
Places174,711114PointCompetition, synergy, survivability
Plexis~200K1.49M edges39 typesStructural reasoning, scenarios

Graph edge families

FamilyEdgesKey relationsWhat it captures
Commercial941,834COMPETES, SYNERGIZES, SUBSTITUTES, EXIT_FRONTAGE, VOID_DECKHow businesses interact
Hierarchy364,058LOCATED_IN, IS_A, PARENT_OF, PART_OFContainment + classification
Anchor121,053ANCHORED_BY, WALK_CATCHMENT, SERVESDemand generators
Spatial27,551ADJACENT_TO, N/S/E/W_OF, ROAD, COASTALPhysical connectivity
Structure10,758SAME_CLUSTER, LU_TRANSITION, DEVELOPMENT_FRONTUrban evolution
Gradient9,263COMMERCIAL/HEIGHT/DENSITY/PRICE gradientsChange across space
Transit5,770CONNECTS_TO, FEEDS_INTO, SAME_CORRIDOR, EXPRESSWAYMovement network
Supply-demand5,260UNDERSUPPLIED, OVERSUPPLIED, DEMAND_LEAKS, COMPARABLEGaps & opportunities

Two-head R-GCN embedding

Spatial Head (64d)
Encodes WHERE — trained on hierarchy + spatial + transit edges. Captures walkability, population, ecosystem.
Commercial Head (64d)
Encodes WHAT — trained on commercial + supply-demand edges. Captures category, competition, demand match.
Full (128d)
Concatenation of both. Used for similarity, expansion, anomaly detection, scenarios.

10 use cases powered by features + graph + embeddings

Each follows: start at entity → traverse typed edges → combine with flat features → produce ranked results + explanation.

1. Brand expansion

Build brand centroid from existing outlet embeddings → find hex-8 cells with similar graph structure but brand absent → rank by similarity × (1 − saturation).
Each brand gets different locations. Starbucks ≠ KFC ≠ Guardian.

2. Explain success

Traverse edges from any place: ANCHORED_BY MRT (transit demand), SYNERGIZES_WITH offices (lunch trade), VOID_DECK_OF HDB (captive demand). Graph tells the story.

3. Scenario simulation

"What if this MRT closes?" → Remove edges → recompute embeddings → places that shift most = most dependent. Quantifies infrastructure impact.

4. True food deserts

Follow UNDERSUPPLIED edges → check DEMAND_LEAKS_TO → no leak = true desert. Filters out CBD false positives.

5. Location-category fit

"Luxury restaurant here?" → Check COMPARABLE_TO (are comparables luxury areas?), PRICE_GRADIENT (gentrifying?), ANCHORED_BY hotels. Multi-signal reasoning.

6. Demand decomposition

Count incoming edges by type: ANCHORED_BY MRT = 35%, VOID_DECK = 25%, SYNERGIZES offices = 20%. Attribution, not just a number.

7. Anomaly detection

Place embedding vs hex embedding similarity. Low = structural misfit (retail in residential, bar in industrial). Flags risky locations.

8. Evolution tracking

DEVELOPMENT_FRONT + GRADIENT increasing + SAME_CLUSTER joining → "Tengah is the next Punggol." Growth prediction.

9. Competitive landscape

Traverse COMPETES_WITH → classify by synergy overlap: 46 threats + 12 allies + 27 substitutes. Net competitive position.

10. Cross-city transfer

Same relation schema for SGP + HKG → find HKG neighborhood matching Toa Payoh's graph structure. Universal urban patterns.

Live demo result: Bubble tea expansion

Top 3: Matilda/Punggol (36K pop, 125K transit, 0 BBT, sat 0.3x), Rivervale/Sengkang (43K pop, 0 BBT), Jurong West Central (48K pop, sat 0.2x). All Dense HDB, young families, massive undersupply. 1.6 seconds.

What the 128d Plexis embedding captures

R² = how much of each feature can be predicted from the 128d embedding alone. Higher = the graph encodes this information well.

Hex-8 R² (embedding → hex feature)

FeatureQuality
walkability_score0.898
EXCELLENT
pull_residential0.894
EXCELLENT
ecosystem_completeness0.827
STRONG
population0.777
GOOD
pull_office0.667
MODERATE
pc_total0.661
MODERATE
transit_daily_taps0.648
MODERATE

Place R² (embedding → place feature)

FeatureQuality
anchor_score0.908
EXCELLENT
demand_context_score0.884
EXCELLENT
competitors_200m0.774
GOOD
complementary_diversity0.696
GOOD
transit_score0.661
MODERATE
survivability_index0.477
MODERATE

Other metrics

MetricValueWhat it means
Category classification69.8%Embedding alone predicts business type 70% of the time (24 categories)
Category separability310xSame-category places are 310x more similar than different-category in commercial head
Retrieval P@50.10010% of top-5 nearest neighbors are same category (vs 4% baseline)
Archetype NMI0.362Embedding clusters partially recover pre-computed neighborhood archetypes
Link prediction Hits@1014.1%True connected node appears in top-10 predictions 14% of the time

Dynamic data adds +10% R²

TargetStatic features only+ LTA dynamicGain
ecosystem_completeness0.7150.818+10.3%
saturation_fnb0.4790.524+4.4%
idx_vitality0.9350.953+1.8%

Summary by dimension

What embedding capturesR² rangeVerdict
Structural / spatial (walkability, ecosystem, population)0.78 – 0.90EXCELLENT — these ARE graph properties
Demand (pull_residential, demand_context, anchor_score)0.88 – 0.91EXCELLENT — demand flows through edges
Competition (competitors, diversity)0.70 – 0.77GOOD — competition is graph density
Transit (taps, transit_score)0.65 – 0.66MODERATE — hub-dependent
Viability (survivability)0.48MODERATE — supply-side beyond graph
Category identity69.8% acc, 310x sepSTRONG — commercial head works
Production formula:   score = embedding_similarity × (1 − saturation) × demand_match
Features handle supply. Graph handles structure. Together they answer everything.
SGP Digital Atlas + Plexis · Propheus · April 2026 · 637+612+114 features · 1.49M graph edges · 128d R-GCN