WARLAB | Financial Services Task Inventory Explorer

How It Was Built

Methodology & Data Sources

A transparent accounting of how this inventory was constructed, enriched, and validated.

Taxonomy Construction

Built a 4-level hierarchy: 15 L1 business functions, 164 L2 processes, 486 L3 activities, and 4,075 L4 tasks. Anchored to a purpose-built O*NET FinServ database (125 financial services occupations, 2,530 task statements, 80,000+ enrichment records) and validated against Canadian Big 5 bank operations with 99.3% task match rate.

Task Enrichment

Each task is characterized across multiple dimensions: cognitive complexity (Bloom's taxonomy 1–6), business importance (1–5), frequency, regulatory classification, cross-functional scope, defense line, required skills, and primary roles. These attributes support diverse analytical use cases — from skills gap analysis to organizational design.

AI Exposure Assessment

Each task is classified using the categorical E0/E1/E2 rubric from Eloundou et al. (2023, “GPTs are GPTs,” Science): E0 = no LLM exposure (physical, embodied, in-person tasks), E1 = direct LLM exposure (writing, code, summarization — LLM alone reduces task time by 50%+), E2 = LLM + tools exposure (data analysis, system integration — requires additional software beyond the LLM). No continuous 0–100 score is used; the classification is categorical per the published rubric. The Eloundou β measure (β = [E1 + 0.5×E2] / total tasks) provides a single aggregation metric at the role or occupation level. Published AIOE z-scores (Felten, Raj & Seamans, 2021, Strategic Management Journal) are available at the occupation level for cross-validation.

AI Exposure Classification

Component	Source	Method	Output
Task Classification	Eloundou et al. (2023) “GPTs are GPTs” (Science)	Each task is independently classified as E0 (no LLM exposure: physical, embodied, in-person), E1 (direct LLM exposure: writing, code, summarization, Q&A — 50%+ time reduction), or E2 (LLM + tools: data analysis, system integration, retrieval)	Categorical label: E0, E1, or E2
Aggregation	Eloundou et al. (2023) β measure	β = [E1 + 0.5 × E2] / total tasks. Ranges from 0 (no exposure) to 1 (all tasks directly LLM-exposed). Computable at role, function, or occupation level.	Current inventory β = 0.888
Cross-Validation	Felten, Raj & Seamans (2021) AIOE Index (Strategic Management Journal)	Published occupation-level z-scores for 85 matched financial services occupations. AIOE maps 10 AI capabilities to 52 O*NET abilities.	Available for occupation-level benchmarking (not used for task-level scoring)

Why categorical? The Eloundou rubric is inherently categorical — it classifies whether an LLM can meaningfully reduce task completion time, not by how much. Published research does not provide a validated methodology for converting E0/E1/E2 to a continuous 0–100 percentage. Using the categorical classification as published ensures methodological integrity.

Why E1 dominates: Financial services is overwhelmingly cognitive work. Felten et al. found that the abilities most exposed to AI — Information Ordering (1.91), Memorization (1.69), Deductive Reasoning (1.04) — are core to nearly every FS role, while the least exposed abilities involve physical dexterity and strength, which FS rarely requires. Current distribution: E0 = 3.5%, E2 = 15.3%, E1 = 81.2%. This means FS as a sector has structurally higher AI exposure than the economy-wide average.

Empirical Validation: Anthropic Economic Index

Where available, tasks are cross-referenced against the Anthropic Economic Index (January 2026 release), which reports empirical task-level success rates from 2 million real Claude.ai conversations (November 2025 data). This provides a real-world complement to the theoretical Eloundou classification.

Metric	Value	Notes
Tasks matched	1,709 of 4,075 (41.9%)	Fuzzy-matched by task description against 2,506 O*NET tasks in AEI dataset
Mean success rate	66.5%	Percentage of conversations where Claude successfully completed the task
Data source	HuggingFace: Anthropic/EconomicIndex	CC-BY license. Methodology in two arXiv papers.

Source caveat: Anthropic is a commercial AI company. Success rates reflect Claude model capabilities specifically, not AI in general. The data is self-selecting (users choose tasks they expect Claude to handle), which inflates aggregate success rates. However, it is the only published dataset providing empirical task-level success metrics from real-world AI usage at scale.

Forward-Looking: Agentic AI Potential

The Eloundou (2023) rubric was calibrated to GPT-4–era chatbot capabilities. Since then, agentic AI — systems that autonomously execute multi-step workflows using tools, code execution, web browsing, and API orchestration — has substantially expanded what AI can do. No peer-reviewed framework yet measures this empirically at the task level (as of March 2026). As a forward-looking indicator, each task is flagged for Agentic Potential: the degree to which agentic AI capabilities would further increase AI’s utility beyond what a basic LLM chatbot provides.

Level	Count	Definition	Example Patterns
High	287 (7.0%)	Multi-step workflows, data pipelines, system integration, automated monitoring, code operations that agentic AI can orchestrate end-to-end	ETL pipelines, automated screening, regression testing, API orchestration, batch document processing
Medium	624 (15.3%)	Data consolidation, report generation, cross-referencing that benefits from AI tool chains but may need human oversight	Reconciliation, report compilation, data extraction, cross-system analysis
Low	3,164	Physical/embodied tasks (E0) or primarily conversational/advisory where agentic capabilities add little beyond basic LLM	Client meetings, physical inspections, advisory conversations, manual approvals

Methodological transparency: This flag is a constructed forward-looking indicator, not a research-validated metric. It is based on rule-based keyword/pattern matching against known agentic capability domains. It should be treated as directional, not definitive. We include it because the alternative — ignoring agentic AI entirely — would make the inventory materially incomplete as a decision tool.

Data Sources

Source	Description	Contribution
*ONET FinServ**	Purpose-built database: 125 occupations, 2,530 tasks, 37 tables, 80,000+ records including skills, knowledge, abilities, technology, certifications, and DWA/IWA activity hierarchy	Core task statements, skills/knowledge profiles, technology stacks, standardized activity mappings
Bank-Specific	Canadian banking domain expertise and Big 5 bank validation (50 real LinkedIn postings)	Institution-unique processes, Canadian regulatory context
Regulatory	FINTRAC, OSFI, Basel III, IFRS 9, TCFD frameworks	Compliance obligations
Certification	FINRA Series 7, CFA, FRM, CISSP outlines	Professional knowledge standards
AI-Era	Emerging tasks from AI/ML adoption in banking	MLOps, responsible AI, bias testing
Anthropic Economic Index	January 2026 release: 2,506 O*NET tasks with empirical success rates from 2M Claude.ai conversations (CC-BY, HuggingFace)	Real-world task-level success rates; fuzzy-matched to 1,709 inventory tasks

Task Schema: 18 Fields

Field	Type	Description
`task_id`	String	Unique ID encoding taxonomy path (e.g., RB.DEP.ACT.001)
`task_name`	String	Verb-object task name
`task_description`	String	Full description with regulatory/business context
`L1_function`	String	Business function (1 of 15)
`L2_process`	String	Process group within L1
`L3_activity`	String	Activity cluster within L2
`onet_soc_codes`	Array	O*NET Standard Occupational Classification codes
`primary_roles`	Array	Job titles that typically perform this task
`importance`	1–5	Business criticality rating
`frequency`	String	How often the task is performed
`cognitive_complexity`	1–6	Bloom's taxonomy level
`regulatory_driven`	Boolean	Whether driven by regulatory requirement
`cross_functional`	Boolean	Whether spans multiple functions
`ai_exposure_class`	E0/E1/E2	Eloundou (2023) classification: E0 (no LLM exposure), E1 (direct LLM — 50%+ time reduction), E2 (LLM + tools — requires additional software)
`agentic_potential`	High/Med/Low	Forward-looking agentic AI potential: multi-step workflows, tool orchestration, autonomous operations
`aei_success_rate`	0–100%	Anthropic Economic Index empirical success rate (where matched; null if no match)
`ai_disposition`	String	Automate, Augment, Restructure, No_Change
`skills_required`	Array	Key skills needed
`defense_line`	String	Risk governance (1st, 2nd, 3rd, NA)
`source`	String	Data provenance category

Hay Method Integration Framework

This inventory is designed to inform Korn Ferry Hay Method job evaluations for hierarchy redesign. The framework operates at two layers: task-derived metrics (from this inventory) and job-level organizational context (from your HRIS/org structure). Both are required for accurate Hay scoring — task composition alone cannot distinguish between an analyst and a VP performing similar analytical work.

Why two layers? The Hay Method evaluates the job, not just its tasks. Two roles can share identical task compositions but score very differently because of organizational context: a VP “developing strategic plans” operates with broader scope, higher decision authority, and greater accountability than an analyst doing the same task. The job-level context layer captures this “organizational amplifier” that task attributes alone cannot express.

Layer 1: Task-Derived Metrics (from this inventory)

Hay Factor	Subfactor	Task Attribute Mapping	Formula / Approach
Know-How Knowledge & skill for competent performance	Technical Depth	`cognitive_complexity` × skill breadth across role	Avg Bloom’s level × count of unique `skills_required` per role
	Managerial Breadth	Proportion of management/planning/directing tasks	Count of supervisory/strategic tasks ÷ total role tasks
	Human Relations	Proportion of interpersonal/advisory/coaching tasks	Count of client-facing or mentoring tasks ÷ total role tasks
Problem Solving Thinking required, as % of Know-How	Thinking Environment	Derived from `ai_exposure_class`	E0 tasks = most unstructured, novel problems; E1 = structured enough for direct LLM assistance; E2 = amenable to AI tool pipelines. Higher proportion of E0 = more human judgment required.
Problem Solving Thinking required, as % of Know-How	Thinking Challenge	`cognitive_complexity` (Bloom’s)	Level 5–6 (Evaluate/Create) = high challenge; Level 1–2 = low
Accountability Accountability for actions & consequences	Freedom to Act	Inverse of `regulatory_driven` density	Roles with high regulatory load = more constrained freedom
	Scope / Magnitude	`importance` × `frequency`	Weighted average across role tasks, scaled by L1 function materiality
	Impact	`defense_line` + direct-impact proportion	1st-line direct operations > 2nd-line oversight > 3rd-line assurance

Layer 2: Job-Level Organizational Context (from your HRIS / org structure)

Context Variable	Source	Hay Factor Impact	How It Modulates
Job Grade / Band	HRIS compensation data	All three factors	Serves as a validation anchor, not a direct input. Compare computed Hay scores against current grades to identify over/under-graded roles. Large gaps (>2 grades) flag misalignment.
Span of Control	Org chart (direct + indirect reports)	Know-How (Mgmt Breadth), Accountability (Scope)	Multiplier: 0 reports = IC baseline; 1–5 = team lead (+15%); 6–20 = manager (+30%); 20+ = senior leader (+50%). Applied to managerial breadth and scope/magnitude subfactors.
Decision Authority Level	Delegation of Authority matrix, approval limits	Accountability (Freedom to Act)	Maps approval thresholds to Hay freedom-to-act scale: prescribed (<$10K) → controlled ($10K–$1M) → standardized ($1M–$50M) → broadly defined ($50M+) → strategic direction (enterprise)
Budget / Revenue Responsibility	Financial planning data, P&L ownership	Accountability (Scope / Magnitude)	Hay uses geometric progression: each level ~15% larger. Map to Hay magnitude scale using ln(budget) normalization. Cost-center roles score lower than revenue-generating roles at equivalent dollar levels.
Reporting Level	Org chart (levels from CEO)	Problem Solving (Thinking Environment)	Fewer levels from CEO = less structured thinking environment, more strategic ambiguity. Maps to Hay thinking environment scale: semi-routine (6+) → patterned (4–5) → variable (3) → broadly defined (2) → abstractly defined (1)
Cross-Functional Accountability	Committee memberships, dotted-line reports, project governance roles	Know-How (Mgmt Breadth), Problem Solving	Roles accountable across multiple L1 functions score higher on managerial breadth. Count of L1 functions in scope: 1 = activity (+0), 2–3 = diverse (+15%), 4+ = broad (+30%)

How to use for job redesign: (1) Aggregate task attributes to the role level using importance-weighted averages (Layer 1). (2) Overlay job-level context variables from HRIS (Layer 2). (3) Compute composite Hay factor scores using the combined two-layer model. (4) Compare against current grades — discrepancies reveal misgraded roles. (5) Cluster roles into job families by L1 function and similar Hay profiles. (6) Assign job levels based on Hay composite score bands (using Hay’s ~15% geometric step progression). (7) Model the future state: remove automated tasks, recompute Layer 1, hold Layer 2 constant, identify which roles shift levels.

Important: This is a reference model based on external data. It is not derived from any specific institution's internal data. Scores should be validated against your organization's actual operating model.

Interactive Tool

Task Explorer

Filter, search, and drill into 4,075 financial services tasks. Click any row to expand full details.

Showing 4,075 of 4,075 tasks

ID ▲	Task ▲	Function ▲	Disposition ▲	E-Class ▲	Agentic ▲	Bloom ▲

Advisory Framework

Mapping to Organizational Roles

A practical guide for connecting this reference inventory to your organization's actual job architecture.

This inventory uses generic role titles. Your organization will have different titles, structures, and task bundles. The mapping process below helps you translate between the two — revealing how tasks cluster into roles, what skills each role requires, and where role boundaries may need to shift.

Build Your Role–Task Alignment

List your actual job titles within each business function. For each role, identify which L2 processes and L3 activities they touch, and estimate the percentage of effort in each area.

Example

Your Role: "Client Service Associate — Branch"
L1: Retail Banking
L2 Processes: Deposit Products (60%), Consumer Lending (25%), Branch Sales (15%)
Reference Tasks: ~45 tasks from those L3 activities apply

Build a Role Complexity & Skills Profile

For each role, aggregate the task-level attributes to understand the role's overall character.

What You Can Derive

Complexity Profile: Distribution of Bloom's levels across the role's tasks — is this a primarily execution role (Bloom 1–2), analytical role (3–4), or strategic role (5–6)?

Skills Footprint: Union of all skills_required across the role's tasks — what is the full capability set this role demands?

Regulatory Burden: What percentage of the role's tasks are regulatory-driven? This affects change velocity and training requirements.

Analyze Role Composition

With tasks mapped and profiled, several analyses become possible:

Task Overlap: Which roles share significant task overlap? These may be candidates for consolidation or clearer boundary definition.
Skill Adjacency: Which roles share skill requirements? These form natural job families and career mobility paths.
Complexity Span: Does a role bundle tasks across too wide a Bloom's range? Roles spanning 4+ levels may need to split into tiered positions.
AI Exposure (Optional): Overlay AI scores and dispositions to understand which tasks within a role are most affected by technology change.

Tip: Export filtered task data from the Explorer, then map against your HRIS headcount data to produce headcount-weighted profiles for each role.

Advisory Framework

Strategic Workforce Planning Integration

How to integrate task-level data into your SWP cycle — for skills planning, capacity modeling, and organizational change.

Assess Current State

Use Role Mapping to establish a baseline of your workforce's task composition.

Pull headcount from HRIS by job title and function
Map job titles to reference tasks using the Role Mapping framework
Profile each role's complexity distribution, skill requirements, and regulatory burden
Identify roles with the highest task diversity (spanning many L2 processes) — these are your most complex workforce planning targets

Identify Gaps & Risks

Compare the desired future state against current capabilities across multiple dimensions.

Skills Gap: Which skills appear in high-complexity tasks but are underrepresented in your current workforce?
Capacity Risk: Are critical tasks concentrated in too few roles or individuals? What happens if those roles turn over?
Regulatory Exposure: Which roles carry heavy regulatory task loads? These require specialized succession planning.
Technology Impact: Use AI exposure scores to identify which tasks (and therefore roles) are most affected by technology change, including AI adoption.

Model Scenarios

Build scenarios to bound workforce evolution under different strategic assumptions.

Scenario Dimensions

Organizational Change: What if you consolidate roles within an L2 process? Model headcount and skill implications.

Technology Adoption: What if tasks with AI score >75 are automated within 24 months? Where does freed capacity go?

Regulatory Shift: What if new regulations add compliance tasks? Which roles absorb the load?

Implement & Monitor

Execute workforce transitions with measurable indicators.

Leading: Reskilling enrollment, internal mobility rate, time-to-fill for redesigned roles
Lagging: Productivity per role, cost-to-serve, customer satisfaction, error rates
Governance: Monthly reviews, cross-functional steering, union consultation where applicable

Advisory Framework

Job Hierarchy Redesign

Using the task inventory to rethink how roles, job families, and organizational layers are structured — grounded in the Hay Method for job evaluation.

The task inventory reveals what people actually do at the L4 level. This makes it possible to challenge existing job boundaries, identify where roles can be consolidated or split, and design a future-state hierarchy grounded in real task clusters rather than inherited org charts.

The Problem with Current Job Hierarchies

Most financial services job hierarchies evolved organically — roles were added, titles inflated, and boundaries hardened around legacy processes. When the underlying work changes (through technology, regulation, or market shifts), the hierarchy itself may no longer reflect the actual nature of the work being done. Two symptoms emerge:

Fragmented Roles

A single end-to-end process is split across 3–5 job titles, each owning a narrow slice. The result: duplicated skills, unclear accountability, and roles that lack the critical mass to justify a distinct grade.

Bloated Roles

A single title bundles unrelated tasks from different L2 processes. The role holder is a generalist by accident, not design — making it difficult to evaluate the role consistently or plan career progression.

The Hay Method & Task-Level Data

The Hay Method (Korn Ferry) is the most widely used job evaluation framework in financial services. It evaluates jobs on three core factors: Know-How, Problem Solving, and Accountability. Traditionally, these are assessed through job descriptions and interviews — a subjective, time-consuming process. The task inventory provides an empirical foundation for each factor.

Know-How

The sum of knowledge, skills, and experience required to perform the job competently.

Inventory fields that inform Know-How:

skills_required — directly enumerates the technical and interpersonal skills each task demands
cognitive_complexity — Bloom's level indicates the depth of knowledge application (recall vs. analysis vs. creation)
regulatory_driven — regulatory tasks typically require specialized, certified knowledge (AML, OSFI, Basel)
onet_soc_codes — links to O*NET's detailed knowledge and education requirements per occupation

Hay Application

Aggregate skills_required across all tasks in a role to measure the breadth of know-how. Use the maximum Bloom's level to gauge depth. Count distinct L2 processes to assess management breadth.

Problem Solving

The thinking required to analyze, evaluate, reason, and arrive at conclusions within the job's environment.

Inventory fields that inform Problem Solving:

cognitive_complexity — Bloom's taxonomy directly measures thinking demand: levels 1–2 (routine/guided), 3–4 (analytical/applied), 5–6 (evaluative/creative)
task_description — verb patterns reveal the thinking environment (e.g., "execute" = well-defined; "assess" = semi-variable; "design strategy" = abstract)
cross_functional — cross-functional tasks require navigating ambiguity across organizational boundaries
ai_exposure_class — E0 tasks tend to involve more novel, unstructured thinking; E1/E2 tasks have more structured, LLM-amenable components

Hay Application

Map the role's Bloom's distribution to Hay's Thinking Challenge scale. Use the proportion of cross-functional tasks to assess the Thinking Environment (how much guidance or precedent exists).

Accountability

The answerability for actions and their consequences — encompassing freedom to act, magnitude of impact, and directness of impact.

Inventory fields that inform Accountability:

defense_line — 1st line (direct execution/ownership), 2nd line (oversight/monitoring), 3rd line (independent assurance) map directly to freedom-to-act levels
importance — business criticality rating (1–5) indicates the magnitude of impact if the task fails
regulatory_driven — regulatory tasks carry external accountability to supervisors, auditors, and regulators
L1_function / L2_process — the organizational scope of the task indicates whether impact is local (branch) or enterprise-wide

Hay Application

Use defense_line to assign Freedom to Act. Weight importance scores by frequency to calculate Magnitude. Assess whether the role's tasks have direct (1st line) or indirect/advisory (2nd/3rd line) impact.

Four-Phase Hierarchy Redesign Process

Task Cluster Analysis

Start by grouping tasks from the inventory into natural clusters based on shared attributes, rather than inheriting current role boundaries.

By L2 Process: Which tasks belong together because they serve the same process end-to-end?
By Cognitive Complexity: Separate high-judgment (Bloom 4–6) from routine execution (Bloom 1–2) — these correspond to different Hay grades and should often be different roles.
By Skills Required: Tasks sharing common skill profiles are natural candidates for a single role family. Shared skills = shared Know-How = same Hay job family.
By Defense Line: Tasks on different defense lines carry fundamentally different accountability profiles and should not be combined in the same role.

Use the Explorer to filter by L1, Bloom's level, and defense line. Export the results and sort by skills_required to see natural groupings emerge. Each cluster is a candidate role.

Role Boundary Redefinition

Once task clusters are identified, draw new role boundaries around them and evaluate each using Hay criteria:

Critical Mass Test: Does the cluster contain enough tasks to justify a full-time position? If not, merge with an adjacent cluster that shares the same Know-How profile.
Hay Coherence: Do all tasks in the proposed role land within 1–2 Bloom levels (Problem Solving), the same defense line (Accountability), and overlapping skill sets (Know-How)? If not, the role is trying to span too many Hay grades.
Span of Complexity: Roles spanning more than 2–3 Bloom levels should split into tiered positions (e.g., Analyst vs. Senior Analyst). This directly maps to different Hay evaluation points.
Cross-Functional Alignment: Tasks flagged as cross_functional=true may indicate roles that should sit in a shared service or center of excellence, which changes the Accountability profile (broader magnitude, more indirect impact).

Example: Retail Lending Hierarchy

Current: Mortgage Intake Clerk → Mortgage Processor → Underwriter → Closing Coordinator → Post-Close Auditor (5 roles, 3 layers)

Hay Analysis: Intake and Processing tasks are Bloom 1–2 with overlapping skills. Underwriting is Bloom 4 with distinct regulatory know-how. Audit is 3rd-line with different accountability. Three natural Hay clusters, not five.

Redesigned: Origination Advisor (client-facing, Bloom 4–5, 1st line) + Lending Operations Specialist (process + exception, Bloom 2–3, 1st line) + Credit Risk Reviewer (2nd/3rd line, Bloom 4–5). Three roles, two layers, each internally coherent against Hay criteria.

Job Family & Career Level Architecture

Organize the new roles into job families and define Hay-aligned career progression:

Job Families: Group roles by shared Know-How domains (skill overlap). Families might be “Client Advisory,” “Risk & Control,” “Data & Intelligence,” “Regulatory Operations” — defined by the skills that their constituent tasks share.
Career Levels (Hay Grades): Within each family, Bloom's levels provide a natural grading structure. Bloom 1–2 tasks define entry/associate grades (lower Know-How, guided Problem Solving). Bloom 3–4 define mid-level grades (analytical Problem Solving, broader Accountability). Bloom 5–6 define senior/leadership grades (evaluative/creative thinking, enterprise-wide impact).
Progression Paths: Career mobility between levels is defined by which new tasks the next level adds. This makes promotion criteria objective: can the person perform the higher-Bloom tasks that define the next grade?
Compensation Banding: Hay evaluation points (derived from task-level Know-How, Problem Solving, and Accountability) provide a defensible, data-backed foundation for pay banding rather than market-matching by title alone.

Transition Mapping & Governance

The new hierarchy is a target state. Getting there requires managed transitions:

Current → Future Role Map: For each existing role, define the target role(s) it maps to. Export the Role Mapping Template from the Export Center and populate with your org's current titles.
Hay Re-Evaluation: Use the task data to draft Hay evaluation profiles for each new role. This accelerates the traditionally manual evaluation process because the Know-How, Problem Solving, and Accountability inputs are already captured in the inventory.
Skill Gap Analysis: Compare skills_required of the future role against current role holders. The delta defines training and reskilling needs.
Phased Rollout: Sequence by business impact. Start with functions where role fragmentation or bloat is most severe, and where the Hay re-evaluation reveals the largest gap between current grading and task-based grading.
Governance: Hierarchy redesign crosses HR, Compensation, business lines, and risk. Establish a cross-functional steering group with sign-off authority on role and grade changes.

Key Principle: The hierarchy should be designed around task clusters evaluated against Hay criteria — not inherited titles or current headcount. Let the tasks define the roles, and let the task attributes define the grades.

Using the Inventory for Hay-Aligned Hierarchy Analysis

Identify Consolidation Opportunities

Filter by L2 process and examine how many distinct primary_roles appear. If 4+ roles share the same L2, similar Bloom levels, and overlapping skills, they occupy the same Hay territory and consolidation is likely warranted.

Detect Grade Misalignment

Sort by cognitive_complexity within an L1 function. If roles at adjacent Bloom levels have identical task types and defense lines, they may be graded differently but doing the same work — a Hay evaluation would merge them.

Map Know-How Clusters

Export tasks and group by skills_required. Roles that share >70% of their skill footprint belong in the same job family. Roles that share <30% may be misclassified in the current hierarchy.

Validate Accountability Structures

Ensure the redesigned hierarchy maintains separation of duties. No role should mix 1st-line and 2nd/3rd-line tasks — the defense_line field makes this auditable and maps directly to Hay's Freedom to Act dimension.

Summary: The task inventory provides the raw material that Hay evaluations require — but captured systematically at scale rather than through role-by-role interviews. By aggregating task-level Know-How (skills, Bloom's, regulatory knowledge), Problem Solving (Bloom's distribution, cross-functional scope), and Accountability (defense line, importance, impact scope), organizations can draft Hay-aligned role evaluations directly from the data, dramatically accelerating the job architecture redesign process.

Technical Guide

Implementation Runbook

How to recreate and extend this analysis internally, combining your organization's proprietary data with the external reference inventory.

This inventory was built entirely from external, publicly available data. Its value as a reference model is that it provides a validated starting point — a comprehensive task taxonomy, scoring engine, and enrichment schema — that any organization can adapt without starting from scratch. The sections below describe exactly how to do that.

Reusable Artifacts from This Analysis

The following outputs from this project can be used directly in your internal implementation. They represent significant upfront work that does not need to be repeated:

4-Level Taxonomy Structure

The hierarchy of 15 L1 Functions → 164 L2 Processes → 486 L3 Activities provides a ready-made classification framework. Your organization can adopt it as-is or modify branches to reflect your specific operating model.

Export: Full JSON from the Export Center. Extract unique L1/L2/L3 combinations to get the taxonomy tree.

Reference Task Library (4,075 tasks)

Each task is a verb-object statement with a full description, skills, roles, and classification metadata. Use as a starting checklist: walk through each L3 activity and confirm which tasks exist in your org, which need rewording, and which are missing.

Export: Full CSV. Filter by L1 function to produce function-specific worksheets for SME validation.

18-Field Task Schema

The schema (task_id, task_name, task_description, L1–L3, SOC codes, roles, importance, frequency, Bloom's, regulatory, cross-functional, AI score, disposition, skills, defense line, source) is designed for analytical versatility. Adopt it as your internal data standard.

Export: JSON schema is self-documenting. See the Methodology tab for field definitions.

AI Exposure Scoring Engine

Categorical E0/E1/E2 task classification per Eloundou et al. (2023, Science). Occupation-level Eloundou β measure for aggregation. Published AIOE z-scores (Felten et al. 2021) available for cross-validation. Full Python implementation in the Data Scientist Runbook below.

Export: The scoring logic is documented in the Methodology tab. Complete Python code in the runbook (Step 4).

O*NET SOC Code Mappings

Each task is linked to O*NET Standard Occupational Classification codes, connecting the inventory to the U.S. Department of Labor's occupational database (knowledge requirements, education levels, wage data, projected growth).

Export: SOC codes are included in every CSV/JSON export. Cross-reference against the free O*NET 30.2 database.

Hay Method Mapping Framework

The Job Hierarchy Redesign tab documents how inventory fields map to Hay's three evaluation factors (Know-How, Problem Solving, Accountability). This mapping template accelerates Hay-aligned job architecture work.

Export: Conceptual framework documented in the Redesign tab. Apply it to your org-specific task data.

Internal Data Sources to Integrate

To move from a reference model to an org-specific analysis, you need to overlay your proprietary data. Here are the key internal sources and what they contribute:

Internal Source	What It Provides	How It Integrates
HRIS / Workday	Job titles, headcount, grades, compensation bands, reporting lines, org structure	Map job titles to reference tasks (Role Mapping step 1). Headcount-weight the analysis to show FTE impact, not just task count.
Job Descriptions (JDs)	Official role responsibilities, qualifications, competency requirements	Validate and customize the reference task list. Add org-specific tasks not in the external inventory. Confirm Bloom's levels match internal expectations.
Process Maps / SOPs	Documented workflows, system touchpoints, handoff points	Validate L2/L3 taxonomy alignment. Identify tasks that are split across roles differently than the reference model assumes.
Time & Motion / Activity Data	How staff actually spend their time (if available from workforce analytics tools)	Replace estimated effort weights with actual observed data. This is the single highest-value internal dataset for this analysis.
Learning Management System (LMS)	Training records, certifications, competency assessments	Map to skills_required to identify existing capability vs. gaps. Feeds directly into SWP skill gap analysis.
Hay / Korn Ferry Evaluations	Existing job evaluation scores, grade structures, point profiles	Compare current Hay grades against the task-derived grades from the Hierarchy Redesign framework. Identify misalignment between current grading and actual task composition.
Incident / Issue Registers	Operational errors, compliance findings, audit issues	Correlate with task-level data to identify which tasks (and therefore roles) are highest risk. Informs importance scoring and defense line validation.
Technology Inventory	Systems, platforms, automation tools currently in use	Informs the “current digitization” scoring factor. Tasks performed on modern platforms score higher for AI readiness.

External Data Sources (Publicly Available)

These are the external sources used to build this reference inventory. All are freely or commercially available:

Source	Access	What to Extract
*ONET 30.2 Database**	Free download: `onetonline.org`	Task statements, knowledge domains, skills, abilities, education requirements, and wage data for 1,000+ occupations. Filter by SOC codes relevant to financial services (13-xxxx, 15-xxxx, 43-xxxx).
Regulatory Frameworks	FINTRAC, OSFI, Basel III/IV, IFRS 9, TCFD — all published online	Compliance obligations that generate regulatory-driven tasks. These define the “non-negotiable” task layer that cannot be eliminated.
Professional Certifications	CFA Institute, GARP (FRM), (ISC)² (CISSP), FINRA	Certification body-of-knowledge outlines define the skill and knowledge standards for professional roles. Use to validate skills_required fields.
Industry Job Postings	Careers pages, Indeed, LinkedIn, Glassdoor	Real-world role descriptions and responsibilities. Useful for validating that the inventory covers actual market roles (see the BMO Coverage Analysis for an example of this validation).
Bloom’s Taxonomy Reference	Standard educational framework (widely published)	Provides the 6-level cognitive complexity scale: Remember (1), Understand (2), Apply (3), Analyze (4), Evaluate (5), Create (6). Used to score each task.

Data Scientist Runbook

The following is a step-by-step technical guide for a data scientist to build the internal integration pipeline. Each step includes the inputs, outputs, and pseudocode logic.

Load & Validate the Reference Inventory

Start by loading the reference data and confirming its structure.

# Step 1: Load reference inventory
import pandas as pd, json

with open('task_inventory.json') as f:
    ref = json.load(f)

ref_tasks = pd.DataFrame(ref['tasks'])
print(f"Reference: {len(ref_tasks)} tasks, {ref_tasks.L1_function.nunique()} L1 functions")
print(f"Schema: {list(ref_tasks.columns)}")

# Validate: no nulls in key fields
assert ref_tasks[['task_id','task_name','L1_function','L2_process','L3_activity']].notna().all().all()

# Extract taxonomy tree for reuse
taxonomy = ref_tasks[['L1_function','L2_process','L3_activity']].drop_duplicates().sort_values(
    ['L1_function','L2_process','L3_activity']
)

Load Internal HRIS Data & Build Role–Task Map

Pull your HRIS export and map each internal job title to reference tasks. This is the most labor-intensive step and typically requires SME input.

# Step 2: Load HRIS and build role-task mapping

hris = pd.read_csv('hris_export.csv')  # columns: employee_id, job_title, department, grade, fte_count
print(f"HRIS: {hris.job_title.nunique()} unique titles, {hris.fte_count.sum()} FTEs")

# Option A: Manual mapping worksheet (SME-assisted)
# Export reference tasks by L1, have SMEs mark which tasks apply to each job title
role_task_map = pd.read_csv('role_task_mapping.csv')  # columns: job_title, task_id, effort_pct

# Option B: Automated fuzzy matching (augments manual mapping)
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Match JD text against reference task descriptions
jd_texts = load_job_descriptions()  # your JD corpus
ref_descs = ref_tasks['task_description'].tolist()

vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
tfidf = vectorizer.fit_transform(ref_descs + jd_texts)
ref_vecs = tfidf[:len(ref_descs)]
jd_vecs = tfidf[len(ref_descs):]

similarities = cosine_similarity(jd_vecs, ref_vecs)
# For each JD, get top-N matching reference tasks above threshold
THRESHOLD = 0.25
for i, jd_title in enumerate(jd_titles):
    top_matches = similarities[i].argsort()[::-1][:20]
    matched = [(ref_tasks.iloc[j].task_id, similarities[i][j])
               for j in top_matches if similarities[i][j] > THRESHOLD]

Add Org-Specific Tasks

The reference inventory covers the industry broadly but won't capture every organization-specific process. Add custom tasks using the same schema.

# Step 3: Add org-specific tasks

# Use the same 18-field schema
custom_tasks = []
custom_tasks.append({
    'task_id': 'CUSTOM.RB.001',        # prefix with CUSTOM to distinguish
    'task_name': 'Process Internal Transfer via Proprietary Platform',
    'task_description': 'Execute inter-branch account transfers using [OrgSystem]...',
    'L1_function': 'Retail Banking',
    'L2_process': 'Deposit Products & Services',
    'L3_activity': 'Transaction Processing',
    'onet_soc_codes': ['43-3071.00'],
    'primary_roles': ['Branch Operations Specialist'],
    'importance': 3,
    'frequency': 'Daily',
    'cognitive_complexity': 2,           # Bloom's level
    'regulatory_driven': False,
    'cross_functional': False,
    'ai_exposure_class': None,           # will be classified in Step 4
    'ai_disposition': None,
    'skills_required': ['Core Banking System', 'Transaction Processing'],
    'defense_line': '1st',
    'source': 'Internal'
})

# Merge with reference
all_tasks = pd.concat([ref_tasks, pd.DataFrame(custom_tasks)], ignore_index=True)

Apply the AI Exposure Classification (E0/E1/E2)

Classify tasks using the categorical rubric from Eloundou et al. (2023). Each task is assigned E0 (no LLM exposure), E1 (direct LLM — 50%+ time reduction), or E2 (LLM + tools). Use the Eloundou β measure for role-level aggregation. Published AIOE z-scores (Felten et al. 2021) are available at the occupation level for cross-validation.

# Step 4: AI Exposure Scoring — Bottom-Up E0/E1/E2 Classification
# References:
#   Eloundou et al. (2023) "GPTs are GPTs", Science 384(6702)
#   Felten, Raj & Seamans (2021) AIOE Index, Strategic Mgmt Journal

import re, json
import numpy as np
import pandas as pd

# ── 4a. Define keyword patterns for E0/E1/E2 classification ──

E0_PATTERNS = [  # Physical, embodied, in-person tasks
    r'(physical|manual|hands-on|in-person|face-to-face|on-site|field)',
    r'(vault|cash handling|cash count|coin|currency|safe|lock|key)',
    r'(inspect|patrol|guard|security check|emergency response)',
    r'(lift|carry|move|transport|deliver|install|repair|maintain equipment)',
    r'(branch operations|teller|counter|window)',
]
E0_STRONG = [  # High-confidence E0 signals
    r'(physically|manual labor|operate machinery|handle cash|count currency)',
    r'(vault operations|security patrol|emergency evacuation)',
]
E1_PATTERNS = [  # Direct LLM tasks (50%+ time reduction with LLM alone)
    r'(write|draft|compose|author|prepare report|create document)',
    r'(summarize|synthesize|consolidate|abstract)',
    r'(review|proofread|edit|revise)',
    r'(code|script|program|develop software|debug)',
    r'(classify|categorize|tag|label)',
    r'(research|investigate|compile information|literature review)',
    r'(explain|describe|articulate|communicate)',
    r'(recommend|suggest|advise|propose)',
    r'(policy|procedure|guideline|standard|template)',
    r'(forecast|project|predict|estimate)',
    r'(train|educate|instruct|onboard)',
]
E2_PATTERNS = [  # LLM + tools (data analysis, system integration)
    r'(analyze data|data analysis|statistical|regression|model)',
    r'(database|SQL|query|extract data|data mining)',
    r'(dashboard|visualization|chart|graph|report generation)',
    r'(monitor|track|alert|detection|surveillance)',
    r'(automate|workflow|process automation|RPA)',
    r'(reconcile|match|validate data|cross-reference)',
    r'(risk model|credit model|scoring model|pricing model)',
    r'(compliance monitoring|regulatory reporting|filing)',
    r'(transaction processing|settlement|clearing)',
    r'(due diligence|KYC|AML|screening)',
]

# ── 4b. Task-level classification function ──

def classify_task_e012(task_text, bloom_level):
    # Classify a task as E0, E1, or E2 per Eloundou rubric.
    # Returns (classification, confidence).
    text = task_text.lower()
    e0 = sum(1 for p in E0_PATTERNS if re.search(p, text))
    e0s = sum(1 for p in E0_STRONG if re.search(p, text))
    e1 = sum(1 for p in E1_PATTERNS if re.search(p, text))
    e2 = sum(1 for p in E2_PATTERNS if re.search(p, text))

    # Strong E0 signal overrides
    if e0s >= 1 and e1 == 0 and e2 == 0:
        return 'E0', 0.9

    # Bloom's modulates: higher-order thinking = more LLM-amenable
    bloom_boost = max(0, (bloom_level - 2) * 0.1) if bloom_level else 0
    total = max(e0 + e1 + e2, 1)
    scores = {
        'E0': (e0 / total) * (1 - bloom_boost),
        'E1': (e1 / total) + bloom_boost * 0.6,
        'E2': (e2 / total) + bloom_boost * 0.4,
    }
    best = max(scores, key=scores.get)
    if best == 'E0' and e0 == 0:
        best = 'E1'  # default to E1 if no E0 signals
    return best, min(0.9, scores[best])

# ── 4c. Disposition assignment (based on E-class, Bloom's, and regulatory status) ──

def assign_disposition(e_class, bloom, regulatory):
    # Assign Automate/Augment/Restructure/No_Change using E-class + task attributes.
    # E0 tasks: no LLM exposure
    if e_class == 'E0':
        return 'No_Change' if bloom >= 5 else 'Restructure'
    # E1 tasks: direct LLM exposure
    if e_class == 'E1':
        if bloom <= 2:
            return 'Automate'  # routine tasks the LLM can handle directly
        elif bloom <= 4:
            return 'Augment' if regulatory else 'Automate'
        else:
            return 'Augment'  # high-order tasks: human + LLM together
    # E2 tasks: LLM + tools
    if e_class == 'E2':
        if bloom <= 2:
            return 'Automate'  # tool pipelines can handle routine E2 tasks
        elif bloom <= 3:
            return 'Restructure'  # may require workflow redesign for tooling
        else:
            return 'Augment'  # complex tasks benefit from AI-augmented workflows
    return 'Augment'

# ── 4d. Apply to all tasks ──

for idx, row in all_tasks.iterrows():
    text = f"{row['task_name']} {row['task_description']}"
    bloom = row['cognitive_complexity']
    e_class, conf = classify_task_e012(text, bloom)

    all_tasks.at[idx, 'ai_exposure_class'] = e_class
    all_tasks.at[idx, 'ai_disposition'] = assign_disposition(
        e_class, bloom, row['regulatory_driven']
    )

print(f"Classification: {all_tasks.ai_exposure_class.value_counts().to_dict()}")

# ── 4e. Compute Eloundou beta at occupation level ──
# beta = [E1 + 0.5 * E2] / total tasks for each occupation

e_map = {'E0': 0, 'E1': 1, 'E2': 0.5}
all_tasks['e_weight'] = all_tasks['ai_exposure_class'].map(e_map)

occ_beta = (all_tasks.explode('onet_soc_codes')
            .groupby('onet_soc_codes')
            .agg(beta=('e_weight', 'mean'),
                 task_count=('task_id', 'count'))
            .reset_index())
occ_beta.columns = ['soc', 'beta', 'task_count']
print(f"
Occupation-level beta:")
print(f"  Mean beta: {occ_beta.beta.mean():.3f}")
print(f"  Range: {occ_beta.beta.min():.3f} - {occ_beta.beta.max():.3f}")

# ── 4f. CROSS-VALIDATE against published AIOE (Felten et al. 2021) ──
# Download AIOE data: github.com/AIOE-Data/AIOE
# The AIOE is an occupation-level z-score index (not task-level).
# We compare our occupation-level beta against AIOE for directional alignment.

aioe_df = pd.read_excel('AIOE_DataAppendix.xlsx', sheet_name='Appendix A')
aioe_map = dict(zip(aioe_df['O*NET-SOC Code'], aioe_df['AIOE']))

occ_beta['aioe'] = occ_beta['soc'].map(aioe_map)
occ_valid = occ_beta.dropna(subset=['aioe'])

# Spearman rank correlation (appropriate for comparing ordinal/z-score vs proportion)
from scipy.stats import spearmanr
rho, p_val = spearmanr(occ_valid['beta'], occ_valid['aioe'])
print(f"
VALIDATION: Spearman rho = {rho:.3f} (p = {p_val:.4f})")
print(f"Matched {len(occ_valid)} occupations")
print(f"Note: AIOE is an occupation-level z-score; beta is a task-derived proportion.")
print(f"Directional agreement (both ranking occupations similarly) is the goal.")

Build Role-Level Profiles & Hay Method Evaluation

Aggregate task data to role level and compute Hay Method factor proxies for job hierarchy redesign. The three Hay factors (Know-How, Problem Solving, Accountability) are derived from task attributes using importance-weighted aggregation.

# Step 5: Role-Level Aggregation & Hay Method Factor Computation

merged = role_task_map.merge(all_tasks, on='task_id')
merged = merged.merge(
    hris[['job_title','fte_count','grade','direct_reports',
          'indirect_reports','levels_from_ceo','approval_limit',
          'budget_responsibility','l1_functions_in_scope']].drop_duplicates(),
    on='job_title'
)
# NOTE: If your HRIS doesn't have all columns, fill with defaults:
# merged['direct_reports'] = merged.get('direct_reports', 0)
# merged['approval_limit'] = merged.get('approval_limit', 10000)
# merged['levels_from_ceo'] = merged.get('levels_from_ceo', 6)

# ── 5a. Basic role aggregation (Layer 1: Task-Derived) ──

role_profiles = merged.groupby('job_title').agg(
    task_count=('task_id', 'count'),
    fte_count=('fte_count', 'first'),
    current_grade=('grade', 'first'),
    unique_skills=('skills_required',
        lambda x: len(set(s for sl in x for s in sl))),
    max_bloom=('cognitive_complexity', 'max'),
    mean_bloom=('cognitive_complexity', 'mean'),
    bloom_std=('cognitive_complexity', 'std'),
    l2_breadth=('L2_process', 'nunique'),
    l3_breadth=('L3_activity', 'nunique'),
    pct_cross_functional=('cross_functional', 'mean'),
    primary_defense_line=('defense_line',
        lambda x: x.mode().iloc[0]),
    mean_importance=('importance', 'mean'),
    pct_regulatory=('regulatory_driven', 'mean'),
    beta=('ai_exposure_class', lambda x: (sum(1 for v in x if v=='E1') + 0.5*sum(1 for v in x if v=='E2')) / len(x)),
    pct_e0=('ai_exposure_class', lambda x: (x == 'E0').mean()),
    pct_e1=('ai_exposure_class', lambda x: (x == 'E1').mean()),
    pct_e2=('ai_exposure_class', lambda x: (x == 'E2').mean()),
    pct_automate=('ai_disposition',
        lambda x: (x == 'Automate').mean()),
    pct_augment=('ai_disposition',
        lambda x: (x == 'Augment').mean()),
).round(3)

# ── 5b. Hay Factor 1: KNOW-HOW ──
# Technical Depth: cognitive complexity * skill breadth
# Managerial Breadth: proportion of supervisory/strategic tasks
# Human Relations: proportion of interpersonal/advisory tasks

MGMT_KW = ['direct','supervise','manage','lead','plan','coordinate',
           'delegate','oversee','strategic','governance','budget']
HR_KW = ['advise','counsel','coach','mentor','negotiate','present',
         'relationship','communicate','client','stakeholder','mediate']

def compute_know_how(role_tasks):
    n = len(role_tasks)
    # Technical Depth (0-100): avg bloom * unique skill count, normalized
    avg_bloom = role_tasks['cognitive_complexity'].mean()
    skills = set(s for sl in role_tasks['skills_required'] for s in sl)
    tech_depth = min(100, avg_bloom * len(skills) / 2)

    # Managerial Breadth (0-100): % of tasks with mgmt keywords
    mgmt_count = sum(1 for _, t in role_tasks.iterrows()
                     if any(kw in (t['task_name'] + ' ' +
                     t['task_description']).lower() for kw in MGMT_KW))
    mgmt_breadth = (mgmt_count / n) * 100

    # Human Relations (0-100): % of tasks with HR keywords
    hr_count = sum(1 for _, t in role_tasks.iterrows()
                   if any(kw in (t['task_name'] + ' ' +
                   t['task_description']).lower() for kw in HR_KW))
    human_rel = (hr_count / n) * 100

    # Composite: weighted sum (Hay weights tech depth highest)
    return round(tech_depth * 0.50 + mgmt_breadth * 0.25 +
                 human_rel * 0.25, 1)

# ── 5c. Hay Factor 2: PROBLEM SOLVING ──
# Thinking Environment: derived from E-class distribution (more E0 = more novel)
# Thinking Challenge: Bloom's level distribution

def compute_problem_solving(role_tasks):
    n = len(role_tasks)
    # Thinking Environment (0-100): based on E-class distribution
    # E0 tasks = fully human, novel problems. E1/E2 = more structured.
    e_weights = {'E0': 100, 'E1': 30, 'E2': 50}  # E0=hardest, E2=mid, E1=most structured
    think_env = role_tasks['ai_exposure_class'].map(e_weights).mean()

    # Thinking Challenge (0-100): weighted by high Bloom's tasks
    bloom_dist = role_tasks['cognitive_complexity'].value_counts(normalize=True)
    # Weight: Bloom 5-6 tasks count 3x, Bloom 3-4 count 1x, Bloom 1-2 count 0.3x
    challenge = sum(
        pct * {1:10, 2:20, 3:40, 4:60, 5:80, 6:100}.get(level, 40)
        for level, pct in bloom_dist.items()
    )

    # Composite (Hay: PS expressed as % of Know-How)
    return round(think_env * 0.40 + challenge * 0.60, 1)

# ── 5d. Hay Factor 3: ACCOUNTABILITY ──
# Freedom to Act: inverse of regulatory constraint
# Scope/Magnitude: importance * frequency * L1 materiality
# Impact: defense line + proportion of direct-impact tasks

LOD_WEIGHT = {'1st': 1.0, '2nd': 0.7, '3rd': 0.5, 'NA': 0.3}

def compute_accountability(role_tasks):
    n = len(role_tasks)
    # Freedom to Act (0-100): 100 - (% regulatory-driven * 100)
    freedom = (1 - role_tasks['regulatory_driven'].mean()) * 100

    # Scope/Magnitude (0-100): mean importance (1-5) normalized to 0-100
    scope = role_tasks['importance'].mean() * 20  # 5 -> 100

    # Impact (0-100): defense line weight * importance
    lod_weights = role_tasks['defense_line'].map(LOD_WEIGHT).fillna(0.3)
    impact = (lod_weights * role_tasks['importance'] / 5 * 100).mean()

    return round(freedom * 0.30 + scope * 0.35 + impact * 0.35, 1)

# ── 5e. Layer 2: Job-Level Organizational Context ──
# These variables come from HRIS, not from task attributes.
# They modulate the task-derived Hay scores to reflect organizational
# position — the "amplifier" that distinguishes an analyst from a VP
# doing similar analytical work.

import math

def span_of_control_multiplier(direct_reports, indirect_reports=0):
    # Hay managerial breadth scale based on total reports
    total = (direct_reports or 0) + (indirect_reports or 0)
    if total == 0: return 1.0       # Individual contributor
    elif total <= 5: return 1.15   # Team lead
    elif total <= 20: return 1.30  # Manager
    elif total <= 100: return 1.50 # Senior manager / Director
    else: return 1.70               # VP / Executive

def decision_authority_score(approval_limit):
    # Map financial approval authority to Hay freedom-to-act scale (0-100)
    # Based on Hay Guide Chart A progression
    if approval_limit is None: return 30  # default: controlled
    limit = float(approval_limit)
    if limit < 10_000: return 15          # Prescribed
    elif limit < 100_000: return 30       # Controlled
    elif limit < 1_000_000: return 50     # Standardized
    elif limit < 50_000_000: return 70    # Generally regulated
    elif limit < 500_000_000: return 85   # Broadly defined
    else: return 95                        # Strategic direction

def budget_magnitude_score(budget):
    # Hay uses geometric (log) scale for magnitude
    # Normalized to 0-100 using ln(budget) / ln(max_expected)
    if budget is None or budget <= 0: return 20
    return min(100, round(math.log(budget) / math.log(10_000_000_000) * 100))

def reporting_level_score(levels_from_ceo):
    # Fewer levels from CEO = more abstract thinking environment
    # Hay thinking environment scale
    level_map = {1: 95, 2: 80, 3: 65, 4: 50, 5: 40, 6: 30}
    return level_map.get(levels_from_ceo, max(15, 95 - levels_from_ceo * 12))

def cross_functional_breadth(l1_count):
    # Number of L1 functions in scope
    if l1_count is None or l1_count <= 1: return 1.0
    elif l1_count <= 3: return 1.15
    else: return 1.30

# ── 5f. Compute combined Hay scores (Layer 1 + Layer 2) ──

hay_scores = {}
for title in role_profiles.index:
    role_tasks = merged[merged.job_title == title]
    row = role_profiles.loc[title]

    # Layer 1: Task-derived base scores
    kh_base = compute_know_how(role_tasks)
    ps_base = compute_problem_solving(role_tasks)
    ac_base = compute_accountability(role_tasks)

    # Layer 2: Job-level organizational context modulation
    span_mult = span_of_control_multiplier(
        row.get('direct_reports', 0),
        row.get('indirect_reports', 0))
    xfunc_mult = cross_functional_breadth(
        row.get('l1_functions_in_scope', 1))
    da_score = decision_authority_score(
        row.get('approval_limit', None))
    bm_score = budget_magnitude_score(
        row.get('budget_responsibility', None))
    rl_score = reporting_level_score(
        row.get('levels_from_ceo', 6))

    # Combine: Layer 1 base * Layer 2 modulation
    # Know-How: span of control amplifies managerial breadth,
    # cross-functional scope amplifies overall breadth
    kh_final = min(100, kh_base * span_mult * xfunc_mult)

    # Problem Solving: reporting level modulates thinking environment
    # (closer to CEO = more abstract/strategic thinking required)
    ps_final = min(100, ps_base * 0.60 + rl_score * 0.40)

    # Accountability: decision authority and budget replace the
    # task-derived freedom/scope estimates with actual org data
    ac_final = min(100,
        ac_base * 0.30 +          # Task-derived impact
        da_score * 0.35 +          # Decision authority (HRIS)
        bm_score * 0.35)           # Budget magnitude (HRIS)

    hay_scores[title] = {
        'know_how': round(kh_final, 1),
        'problem_solving': round(ps_final, 1),
        'accountability': round(ac_final, 1),
        # Keep Layer 1 scores for comparison
        'kh_task_only': round(kh_base, 1),
        'ps_task_only': round(ps_base, 1),
        'ac_task_only': round(ac_base, 1),
        # Layer 2 context values
        'span_multiplier': span_mult,
        'decision_authority': da_score,
        'budget_magnitude': bm_score,
        'reporting_level': rl_score,
    }

hay_df = pd.DataFrame(hay_scores).T
hay_df['hay_composite'] = (
    hay_df['know_how'] * 0.40 +
    hay_df['problem_solving'] * 0.30 +
    hay_df['accountability'] * 0.30
).round(1)

# Also compute task-only composite for Layer 1 vs combined comparison
hay_df['hay_task_only'] = (
    hay_df['kh_task_only'] * 0.40 +
    hay_df['ps_task_only'] * 0.30 +
    hay_df['ac_task_only'] * 0.30
).round(1)

role_profiles = role_profiles.join(hay_df)

# ── 5g. Derive job levels from composite Hay score ──
# Using Hay's ~15% geometric step progression across levels

def hay_to_level(composite):
    if composite >= 80: return 'Executive / SVP'
    elif composite >= 68: return 'Director / VP'
    elif composite >= 56: return 'Senior Manager / Lead'
    elif composite >= 44: return 'Manager / Senior Specialist'
    elif composite >= 32: return 'Analyst / Specialist'
    else: return 'Associate / Coordinator'

role_profiles['suggested_level'] = role_profiles.hay_composite.apply(hay_to_level)

# ── 5h. Compare and identify misalignment ──

print("
=== HAY EVALUATION SUMMARY ===")
print(f"{'Role':<40} {'KH':>5} {'PS':>5} {'AC':>5} {'Comp':>5} "
      f"{'Task-Only':>9} {'Δ':>4} {'Level'}")
print("─" * 100)
for title, p in role_profiles.iterrows():
    delta = p.hay_composite - p.hay_task_only
    flag = '⬆' if delta > 10 else ('⬇' if delta < -5 else '')
    print(f"{title[:40]:<40} {p.know_how:>5.0f} {p.problem_solving:>5.0f} "
          f"{p.accountability:>5.0f} {p.hay_composite:>5.0f} "
          f"{p.hay_task_only:>9.0f} {delta:>+4.0f} {p.suggested_level} {flag}")

# Show where Layer 2 context makes the biggest difference
print(f"
Largest Layer 2 impact (org context vs task-only):")
role_profiles['layer2_delta'] = role_profiles.hay_composite - role_profiles.hay_task_only
top_delta = role_profiles.nlargest(10, 'layer2_delta')
for title, p in top_delta.iterrows():
    print(f"  {title[:45]:<45} +{p.layer2_delta:.0f} pts "
          f"(span={p.span_multiplier:.2f}x, auth={p.decision_authority:.0f}, "
          f"budget={p.budget_magnitude:.0f})")

# ── 5i. Future-state modeling: remove automated tasks ──
# Layer 1 (task composition) changes; Layer 2 (org context) held constant

future_tasks = merged[merged.ai_disposition != 'Automate']
print(f"
Future-state: {len(future_tasks)} tasks remain after automation")
print(f"Removed: {len(merged) - len(future_tasks)} tasks "
      f"({(len(merged)-len(future_tasks))/len(merged)*100:.1f}%)")

# Recompute Layer 1 for future state, keep Layer 2 constant
for title in role_profiles.index:
    ft = future_tasks[future_tasks.job_title == title]
    if len(ft) > 0:
        # Layer 1 recalculated from remaining tasks
        kh_base = compute_know_how(ft)
        ps_base = compute_problem_solving(ft)
        ac_base = compute_accountability(ft)
        # Layer 2 held constant (org context doesn't change)
        p = role_profiles.loc[title]
        kh_f = min(100, kh_base * p.span_multiplier *
                   cross_functional_breadth(p.get('l1_functions_in_scope', 1)))
        ps_f = min(100, ps_base * 0.60 + p.reporting_level * 0.40)
        ac_f = min(100, ac_base * 0.30 + p.decision_authority * 0.35 +
                   p.budget_magnitude * 0.35)
        role_profiles.at[title, 'future_hay'] = round(
            kh_f * 0.4 + ps_f * 0.3 + ac_f * 0.3, 1)
    else:
        role_profiles.at[title, 'future_hay'] = 0  # role eliminated

# Identify roles that shift levels
role_profiles['future_level'] = role_profiles.future_hay.apply(hay_to_level)
shifted = role_profiles[role_profiles.suggested_level != role_profiles.future_level]
print(f"
{len(shifted)} roles shift job levels in future state")

# Show which way they shift
for title, p in shifted.iterrows():
    direction = '↓ DOWNGRADE' if p.future_hay < p.hay_composite else '↑ UPGRADE'
    print(f"  {title[:40]:<40} {p.suggested_level} → {p.future_level} "
          f"({p.hay_composite:.0f} → {p.future_hay:.0f}) {direction}")

Generate Hierarchy Redesign Outputs

Use role profiles and Hay evaluations to produce actionable deliverables for workforce transformation.

# Step 6: Analytical outputs

# ── A. Skill gap matrix ──
current_skills = load_lms_data()  # your LMS export
required_skills = (merged.explode('skills_required')
    .groupby(['job_title','skills_required']).size()
    .unstack(fill_value=0))
gap_matrix = required_skills.subtract(current_skills, fill_value=0)

# ── B. Role consolidation candidates ──
from itertools import combinations
from sklearn.metrics.pairwise import cosine_similarity

# Build task vectors per role (binary: does role include task?)
role_task_matrix = (merged.groupby(['job_title','task_id']).size()
    .unstack(fill_value=0))
sim = cosine_similarity(role_task_matrix)
sim_df = pd.DataFrame(sim, index=role_task_matrix.index,
                      columns=role_task_matrix.index)

# Identify roles with >60% task overlap
for r1, r2 in combinations(role_profiles.index, 2):
    if sim_df.loc[r1, r2] > 0.6:
        fte = role_profiles.loc[[r1,r2], 'fte_count'].sum()
        print(f"Consolidate: {r1} + {r2} "
              f"(similarity: {sim_df.loc[r1,r2]:.0%}, FTEs: {fte})")

# ── C. Hay evaluation report ──
print("
=== HAY EVALUATION REPORT ===")
for title, p in role_profiles.iterrows():
    print(f"
{'='*60}")
    print(f"ROLE: {title}")
    print(f"Current Grade: {p.current_grade}  |  Suggested Level: {p.suggested_level}")
    print(f"{'─'*60}")
    print(f"KNOW-HOW:          {p.know_how:>6.1f}")
    print(f"  Technical Depth:   {p.unique_skills} skills, "
          f"Bloom avg={p.mean_bloom:.1f}, max={p.max_bloom}")
    print(f"  Mgmt Breadth:      {p.l2_breadth} L2 processes, "
          f"{p.l3_breadth} L3 activities")
    print(f"PROBLEM SOLVING:   {p.problem_solving:>6.1f}")
    print(f"  Thinking Env:      beta={p.beta:.3f} "
          f"(lower=more novel, E0-heavy)")
    print(f"  Challenge:         Bloom std={p.bloom_std:.1f}, "
          f"{p.pct_cross_functional:.0%} cross-functional")
    print(f"ACCOUNTABILITY:    {p.accountability:>6.1f}")
    print(f"  Freedom to Act:    {(1-p.pct_regulatory)*100:.0f}% non-regulatory")
    print(f"  Defense Line:      {p.primary_defense_line}")
    print(f"  Importance:        {p.mean_importance:.1f}/5")
    print(f"{'─'*60}")
    print(f"HAY COMPOSITE:     {p.hay_composite:>6.1f}  →  {p.suggested_level}")
    if hasattr(p, 'future_hay') and p.future_hay != p.hay_composite:
        print(f"FUTURE STATE:      {p.future_hay:>6.1f}  →  {p.future_level}")

# ── D. SWP scenario model with Hay impact ──
def model_scenario(roles_df, merged_df, automate_threshold):
    # Model headcount and Hay-level shifts at different automation thresholds.
    at_risk = roles_df[roles_df.pct_automate > automate_threshold]
    fte_impact = at_risk.fte_count.sum() * at_risk.pct_automate.mean()
    level_shifts = (at_risk.suggested_level != at_risk.future_level).sum()
    return {
        'roles_affected': len(at_risk),
        'fte_impact': round(fte_impact),
        'level_shifts': level_shifts,
        'skills_at_risk': merged_df[
            merged_df.job_title.isin(at_risk.index) &
            (merged_df.ai_disposition == 'Automate')
        ].explode('skills_required')['skills_required'].value_counts().head(10)
    }

scenarios = {
    'Conservative (>50% auto)': model_scenario(role_profiles, merged, 0.5),
    'Balanced (>30% auto)': model_scenario(role_profiles, merged, 0.3),
    'Aggressive (>15% auto)': model_scenario(role_profiles, merged, 0.15),
}
for name, s in scenarios.items():
    print(f"
{name}: {s['roles_affected']} roles, ~{s['fte_impact']} FTEs, "
          f"{s['level_shifts']} level shifts")
    print(f"  Top skills at risk: {', '.join(s['skills_at_risk'].index[:5])}")

# ── E. Export role profiles for Hay Guide Chart input ──
export_cols = ['task_count','fte_count','current_grade',
    # Layer 1 (task-derived)
    'kh_task_only','ps_task_only','ac_task_only','hay_task_only',
    # Layer 2 context
    'span_multiplier','decision_authority','budget_magnitude','reporting_level',
    # Combined Hay scores
    'know_how','problem_solving','accountability','hay_composite',
    'suggested_level','layer2_delta',
    # Future state
    'future_hay','future_level',
    # AI exposure
    'beta','pct_automate','pct_e0','pct_e1','pct_e2']
export_df = role_profiles[[c for c in export_cols if c in role_profiles.columns]]
export_df.to_csv('hay_evaluation_export.csv')
print(f"
Exported {len(export_df)} role evaluations to hay_evaluation_export.csv")
print(f"Columns: {list(export_df.columns)}")

Validate & Iterate

Quality assurance steps before presenting results to stakeholders.

SME Review: Have business leads review the role–task mapping for their function. Flag tasks that are missing, misattributed, or obsolete.
Bloom's Calibration: Spot-check 10% of tasks per L1 function to confirm cognitive complexity ratings match SME judgment.
E0/E1/E2 Validation: Run the AIOE cross-validation (Step 4f). Compute Spearman rank correlation between occupation-level β and published AIOE z-scores. Directional agreement (both ranking occupations similarly) is the validation target. If rank correlation is weak, review E0/E1/E2 keyword patterns for your domain-specific terminology.
Hay Cross-Check: Compare computed Hay composites against existing Korn Ferry evaluations (if available). Large discrepancies (>15 points) indicate either scoring issues or genuinely misgraded roles. Export from Step 5g for side-by-side comparison.
Coverage Test: Confirm that every internal job title maps to at least 5 reference tasks. Titles with <5 matches may indicate gaps in the inventory or misclassification.
Defense Line Audit: Verify no role mixes 1st-line and 2nd/3rd-line tasks. Flag violations for review with Risk and Compliance.
Future-State Reasonableness: Review roles where the future-state Hay level differs from current grade by >2 levels. These are candidates for accelerated reskilling or managed transition.

Pipeline Architecture Summary

Data Flow: Reference Inventory (JSON) + HRIS Export + Job Descriptions + LMS Data → Role–Task Mapping (manual + fuzzy match) → Org-Specific Task Inventory (merged) → E0/E1/E2 Classification (cross-validated against AIOE) → Role Profiles (task-aggregated) → Hay Factor Computation (Know-How, Problem Solving, Accountability) → Analytical Outputs (Hay evaluations, future-state modeling, skill gap matrices, consolidation candidates, SWP scenarios)

Estimated Effort

4–8 weeks for a mid-sized bank (<20k FTEs). Primary bottleneck is SME validation of role–task mappings (Step 2).

Team Composition

1 data scientist (pipeline), 1 HR/workforce planning analyst (mapping), SMEs from each L1 function (validation), 1 project lead.

Technology Stack

Python (pandas, scikit-learn), any SQL database for storage, BI tool (Power BI / Tableau) for visualization, Excel for SME worksheets.

Getting Started: Export the full JSON from the Export Center. That file contains the complete reference inventory, ready to load into your pipeline as Step 1. The Role Mapping Template CSV provides a blank worksheet for Step 2.

Financial Services Task Inventory

The Three Building Blocks

Task

Skill

Role

How They Relate

Why Start with Tasks?

Tasks Are Stable

Tasks Are Comparable

Tasks Are Measurable

Tasks Enable Redesign

The 4-Level Taxonomy

L1

L2

L3

L4

Cognitive Complexity (Bloom's)

Tasks by Function & Disposition

AI Exposure Class (E0/E1/E2)

Agentic AI Potential

Defense Line Composition

Taxonomy Construction

Task Enrichment

AI Exposure Assessment

Business Function

AI Disposition

AI Exposure Class

Agentic Potential

Bloom's Complexity

Defense Line

Build Your Role–Task Alignment

Example

Build a Role Complexity & Skills Profile

What You Can Derive

Analyze Role Composition

Assess Current State

Identify Gaps & Risks

Model Scenarios

Scenario Dimensions

Implement & Monitor

The Problem with Current Job Hierarchies

Fragmented Roles

Bloated Roles

The Hay Method & Task-Level Data

Know-How

Hay Application

Problem Solving

Hay Application

Accountability

Hay Application

Four-Phase Hierarchy Redesign Process

Task Cluster Analysis

Role Boundary Redefinition

Example: Retail Lending Hierarchy

Job Family & Career Level Architecture

Transition Mapping & Governance

Using the Inventory for Hay-Aligned Hierarchy Analysis

Identify Consolidation Opportunities

Detect Grade Misalignment

Map Know-How Clusters

Validate Accountability Structures

Reusable Artifacts from This Analysis

4-Level Taxonomy Structure

Reference Task Library (4,075 tasks)

18-Field Task Schema

AI Exposure Scoring Engine

O*NET SOC Code Mappings

Hay Method Mapping Framework

Internal Data Sources to Integrate

External Data Sources (Publicly Available)

Data Scientist Runbook

Load & Validate the Reference Inventory

Load Internal HRIS Data & Build Role–Task Map

Add Org-Specific Tasks

Apply the AI Exposure Classification (E0/E1/E2)

Build Role-Level Profiles & Hay Method Evaluation

Generate Hierarchy Redesign Outputs

Validate & Iterate

Pipeline Architecture Summary

Estimated Effort