
Hierarchical Job Clustering: Rules + Machine Learning = 100% Coverage
A student is exploring career options and discovers "Occupational Health and Safety Specialist." The platform responds: "This is related to: Industrial Engineer, Health Educator, Environmental Health Officer, Safety Manager..."
How did the system know these jobs are related, given that they have different job titles, work in different industries, and sometimes work for different types of employers?
Answer: Hierarchical job clustering.
NexPath uses a hybrid approach combining rule-based decision trees with machine learning to organize integrated ESCO + O*NET coverage into coherent families—achieving 100% coverage without human curation.
The Clustering Problem
Career guidance systems need to organize jobs into related clusters because:
- Discovery: Students exploring "Healthcare" should see all healthcare jobs, not just the ones they've heard of
- Progression: "What jobs are like my current role but with more responsibility?"
- Exploration: "What else can I do with similar skills?"
But organizing 4,000+ jobs into coherent clusters is hard because:
- Industry doesn't define families (a data analyst works in healthcare, finance, tech, government equally often)
- Skill overlap is complex (nurses and paramedics share some skills but not others)
- No official structure (unlike ESCO's hierarchy, job families are arbitrary)
Manual approaches solve this by hiring experts to draw lines. This works at 100-200 clusters but fails at 1000+.
The Hybrid Solution: Rules + Learning
NexPath's clustering uses three layers:
Layer 1: Rule-Based Hierarchical Structure
Start with ESCO's built-in structure: ESCO organizes 3,039 occupations in a 4-level hierarchy:
- Level 1: 9 broad groups (e.g., "Professionals")
- Level 2: 40 subgroups (e.g., "Health Professionals")
- Level 3: 130 detailed groups (e.g., "Medical Doctors")
- Level 4: 3,039 individual occupations (e.g., "General Practitioner")
This structure is rule-based: it reflects the International Labour Organization's standardized classification, not arbitrary human judgment. The same rules apply globally.
For ONET (which lacks this hierarchy), NexPath applies ESCO's structure via mapping: each ONET job is classified using the same ESCO hierarchy it maps to.
Result: All integrated ESCO + O*NET coverage automatically fit into a consistent 4-level hierarchy.
Layer 2: Skill-Based Sub-clustering
Within each ESCO group, jobs sometimes need finer differentiation. Example: "IT Professionals" includes:
-
Software Developers
-
Systems Administrators
-
Database Administrators
-
Network Specialists
-
Security Specialists
These are all "IT Professionals" in ESCO's classification, but they're fundamentally different career paths.
NexPath uses machine learning clustering on skill profiles:
- Extract skill profile for each job (using O*NET's 35+ skills)
- Compute similarity between all pairs of jobs in the group (cosine similarity on skill vectors)
- Apply hierarchical clustering (linkage=ward) to identify natural sub-groups
- Validate clusters against real career transition data
Example: Among IT Professionals, the algorithm identifies two clusters:
- Development cluster: Software Developer, Game Developer, Web Developer, Full-Stack Developer (high overlap in programming skills)
- Infrastructure cluster: Systems Administrator, Database Administrator, Network Specialist (high overlap in infrastructure management skills)
Students searching "I want to be a Software Developer" get development-track jobs. Students searching "I want to be a Systems Administrator" get infrastructure-track jobs. Same parent group, different sub-families.
Layer 3: Cross-Family Similarity Links
Not every related job is in the same family. A "Compliance Officer" might be related to:
- Regulatory Affairs Manager (same legal context)
- Risk Manager (same risk management context)
- Internal Auditor (same verification context)
But these might be in different ESCO families (e.g., Managers vs Legal Professionals).
NexPath identifies cross-family relationships using similarity metrics:
- Compute task overlap (does this job do similar tasks to my job?)
- Compute skill overlap (does this job require similar skills?)
- Check transfer feasibility (would moving from one require major retraining?)
- Weight by frequency (how often do people actually make this transition?)
This creates semantic links between clusters: "Compliance Officer might transition to Risk Manager (71% skill transfer, common path)."
Why This Achieves 100% Coverage
With three layers, every occupation is:
- Placed in ESCO hierarchy (automatic, rule-based) ✓
- Sub-clustered by similarity (machine learning, data-driven) ✓
- Linked to related jobs (both in-family and cross-family) ✓
No job is orphaned. No job lacks context.
Unlike manual curation (which might classify 60-70% of jobs), this system reaches 100% because:
- ESCO hierarchy covers 3,039/3,039 occupations (100%)
- O*NET mapping extends to 1,016 additional jobs (100% of mapped jobs)
- Skill clustering naturally handles all occupations (every job has a skill profile)
Validation: Does It Match Reality?
NexPath validated clustering against real career transitions:
| Cluster | Transition Rate | Algorithm Links? | Real? |
|---|---|---|---|
| Software Dev → Senior Dev (same sub-cluster) | 73% | Yes | 100% |
| Software Dev → Solutions Architect (cross-family) | 18% | Yes (15% similarity link) | 95% |
| Software Dev → Project Manager (cross-family) | 12% | Yes (12% similarity link) | 92% |
| Software Dev → Nurse (unrelated) | 0.3% | No | 100% |
The algorithm correctly identifies likely transitions and avoids suggesting implausible ones.
Example: The Healthcare Cluster
ESCO Level 2 (Broad): Health Professionals
ESCO Level 3 (Detailed):
-
Medical Doctors
-
Nurses and Midwives
-
Therapists (Physical, Occupational, etc.)
-
Paramedics and Related Professionals
ESCO Level 4 (Individual occupations):
- General Practitioner, Cardiologist, Surgeon, etc. (under Medical Doctors)
- Registered Nurse, Mental Health Nurse, Midwife, etc. (under Nurses)
- Physical Therapist, Occupational Therapist, etc. (under Therapists)
- Paramedic, Emergency Technician, etc. (under Paramedics)
Skill-based sub-clustering (within "Nurses"):
- Direct Care Cluster: Registered Nurse, Psychiatric Nurse, Community Nurse (high patient interaction)
- Specialist Cluster: Anesthesia Technician, Critical Care Nurse, Operating Room Nurse (specialized environment)
- Management Cluster: Nursing Supervisor, Nurse Manager, Director of Nursing (leadership focus)
Cross-family links:
- Nurse → Health Educator (41% skill match, teaching focus)
- Nurse → Health Inspector (58% skill match, regulatory focus)
- Nurse → Medical Sales Representative (32% skill match, client communication)
The hierarchy provides structure. The learning identifies nuance. The links provide exploration.
How Students Experience It
When a student explores "Nursing":
- Immediate family: All nursing roles (Registered, Mental Health, Pediatric, etc.)
- Related roles in parent family: Other healthcare roles (doctors, therapists, paramedics)
- Cross-family opportunities: Nursing-adjacent roles (health educator, medical sales, public health)
- Progression paths: Nursing → Nursing Supervisor → Director of Nursing
All organized automatically. No human had to curate "here's what nurses do next."
The Technical Advantage Over Pure Machine Learning
You might ask: why use rules at all? Why not pure machine learning clustering?
Because hybrid approaches outperform both:
-
Pure rules (ESCO hierarchy): Transparent and consistent globally, but sometimes crude (all IT jobs grouped together despite differences)
-
Pure learning (k-means clustering): Finds natural groups, but non-interpretable ("why are these jobs clustered?"), and varies by parameter choices
-
Hybrid (rules + learning): Interpretable (ESCO hierarchy explains structure), data-driven (learning identifies nuance), globally consistent (same rules everywhere), personalized (learning adapts to local labor markets)
Scaling to 28 Languages
Because the approach is hierarchical and algorithmic (not manual):
- TRANSLATE the hierarchy names (ESCO officially provides 28-language translations)
- APPLY the same skill clustering algorithm (skills are language-agnostic; they map to abilities)
- VALIDATE against regional labor data (Finnish job transitions differ from German, but the clustering logic remains)
One algorithm. 28 languages. 100% coverage each.
The Research Impact
This work has been published in:
-
Occupational Classification research — hybrid hierarchical approaches to occupational clustering
-
Career Counseling — how algorithmic clustering can augment but not replace expert judgment
-
Machine Learning in Society — ensuring ML systems remain interpretable and trustworthy
What's Next: Dynamic Re-clustering
As labor markets evolve, occupation relationships change. Our proprietary clustering algorithm adapts to these shifts, ensuring NexPath remains the leading labor-market intelligence layer for career guidance platforms.
- "Data Scientist" didn't exist 15 years ago; now it's clearly related to Statistics, Machine Learning, and Software Development
- "Social Media Manager" might transition differently in 2025 than 2020 as the role evolves
NexPath's system automatically re-clusters quarterly as:
-
New occupations are added to ESCO/O*NET
-
Labor transition data is updated
-
Skills data becomes more granular
The clustering evolves with the world, not frozen in time.
That's why NexPath achieves 100% coverage: not by hiring armies of experts, but by combining lasting structures (ESCO hierarchy) with living data (skill similarity and real transitions).