Data Integration: RM/COBOL to AWS RDS with Redshift and OpenSearch (Ordered Sync)

Overview

A PBM platform relied on a legacy RM/COBOL system as the system of record, while modern digital channels needed fast access to claims and transaction data for portal experiences, search, and analytics. The requirement was to synchronize COBOL transactions into AWS in near-realtime, ensuring strict ordering (the same sequence as inserted/committed in COBOL) and delivering the data to Application AWS RDS, Amazon Redshift, and Amazon OpenSearch.

External pharmacy claims ingestion was a key upstream dependency. We used Mirth Integration as the integration hub for receiving pharmacy claim transactions, performing validation/normalization/routing, and reliably handing off to COBOL workflows. After COBOL commits, Audit Programs produced auditable change records into an EFS-backed landing zone. A Relativity Driver (orchestration/CDC component) processed these records with SQS-based decoupling for retries and backpressure, then published ordered events into Confluent Kafka. Kafka served as the streaming backbone for fan-out to operational, analytical, and search destinations.

The solution was built with HIPAA-aligned controls: encryption in transit and at rest, least-privilege access, and audit-friendly monitoring. The delivered platform enabled modernization without destabilizing the COBOL environment, while providing fresh, ordered data to modern applications.

Client Profile

  • Industry: Healthcare / PBM (Legacy Claims & Eligibility Platform Modernization)
  • Region: North America
  • HQ: United States (Southern USA)
  • Operations: Nationwide
  • Company Size: Established PBM / Third-Party Administrator with legacy core platform
  • Revenue: Approx. $500M – $2B annually
  • Staff: ~100-200 employees (claims ops + clinical + growing cloud/data teams)

What They Do:

Core Mission: Run pharmacy claims operations on a mature COBOL-based core while modernizing data access for digital products, analytics, and search—without disrupting mission-critical processing.

Key Services:

  • Legacy Claims Processing: COBOL system-of-record handling eligibility, adjudication outcomes, pricing, and audit trails.
  • External Claims Intake: Uses an integration engine (e.g., Mirth) to ingest pharmacy claims/transactions from multiple partners and formats.
  • Data Products for Modern Apps: Provides near-realtime operational datasets (ODS), analytics warehouse feeds, and searchable indexes.
  • BI & Compliance Reporting: Supports executive dashboards and regulatory/compliance audits with strict traceability.
    Operational Model: Operates as a high-integrity transaction and audit system where correctness and ordering matter. Uses event streaming and data replication to create cloud-native read models (RDS), analytics (Redshift), and search (OpenSearch), enabling modern applications while keeping COBOL stable.

Features

This project delivered a highly resilient, ordered, near-realtime data integration pipeline from legacy COBOL systems into modern AWS-based destinations, enabling real-time digital services in healthcare PBM (Pharmacy Benefit Management).

Core Capabilities:

  • Near-Realtime Synchronization: Achieved sub-minute latency between COBOL transaction commit and downstream availability.
  • Strict Transaction Ordering: Ensured events are processed and delivered exactly as committed in the COBOL system — critical for financial and claims accuracy.
  • Multi-Destination Delivery: Single source of truth (Kafka) fan-out to three distinct workloads:
    1. Operational DB:AWS RDS (for application use cases).
    2. Analytics Warehouse:Amazon Redshift (for reporting & BI).
    3. Search Index:Amazon OpenSearch (for portal search & self-service).
  • Idempotent & Retry-Safe Processing: Designed to handle partial failures, retries, and replay without duplicates or data corruption.
  • Replayable & Auditable Pipeline: Supports backfills, debugging, compliance audits, and recovery from outages using Kafka’s retention and replay capabilities.
  • Resilient External Ingestion: Mirth Connect handles variable pharmacy claim formats, validates payloads, and ensures reliability across partners.
  • Zero Downtime Modernization: Enabled new digital features without touching or disrupting the legacy COBOL system.

Technologies

A full-stack, cloud-native streaming architecture was built using industry-leading tools:

Layer

Technology

Integration Hub

Mirth Connect (HL7/FHIR/JSON parsing, routing, validation)

Legacy Data Capture

COBOL Audit Programs + EFS Landing Zone

Streaming Engine

Confluent Kafka (with custom partitioning strategy)

Orchestration & Buffering

Amazon SQS (dead-letter queues, backpressure handling)

Compute & Orchestration

AWS Lambda (event processing), Python (custom drivers)

Operational Database

Amazon RDS (PostgreSQL/MySQL)

Analytical Warehouse

Amazon Redshift (columnar, batch analytics)

Search & Discovery

Amazon OpenSearch (full-text search, filtering)

CI/CD & DevOps

GitHub Actions, AWS CodeBuild, CloudFormation

Observability

CloudWatch (metrics/logs), X-Ray (tracing), Grafana (dashboards)

Additional Tools: Kafka Connect (optional for bulk ingestion), EKS (for containerized components if needed), IAM roles, KMS encryption.

Security Model

Designed from the ground up with HIPAA-compliant security controls, ensuring confidentiality, integrity, and auditability.

Key Security Principles:

  • Encryption by Default:
    1. In Transit:TLS 1.3 enforced across all connections (Mirth → COBOL, Kafka → RDS/Redshift/OpenSearch).
    2. At Rest:KMS-managed encryption for EFS, RDS, Redshift, OpenSearch, SQS, and Kafka logs.
  • Least Privilege Access:
    1. IAM roles assigned per component (Lambda, EC2, EKS) with minimal permissions.
    2. No hardcoded credentials; secrets managed via AWS Secrets Manager / Parameter Store.
  • Audit Trail & Traceability:
    1. All data movement logged via CloudTrail, VPC Flow Logs, and Kafka audit logs.
    2. Event metadata includes timestamps, sequence IDs, source system, and user context.
    3. DLQs and error topics capture failed messages for forensic review.
  • Network Isolation:
    1. Private subnets for Kafka, RDS, Redshift, OpenSearch.
    2. VPC endpoints for secure access to AWS services (no public internet exposure).
    3. Security groups restrict traffic to only required ports and IPs.

Compliance Alignment: HIPAA (Business Associate Agreement), SOC 2 Type II, NIST SP 800-53.

Data Types & Standards

The system processes sensitive, regulated healthcare data under strict standards.

Data Types Handled:

  • Pharmacy Claims (Primary): Drug name, NDC, dosage, quantity, price, prescriber ID, patient ID, payer info.
  • Transaction Metadata: Commit timestamp, sequence number, source system (COBOL), status (committed, failed).
  • Patient Identifiers: PII (e.g., name, DOB, SSN) — masked/sanitized where possible.
  • PHI (Protected Health Information): Clinical and prescription details subject to HIPAA.
  • Financial Records: Claim adjudication results, payment amounts, co-pays, deductibles.

Regulatory & Industry Standards:

Standard

Application

HIPAA

Core compliance framework for PHI protection

SOC 2

Trust Services Criteria (Security, Availability, Confidentiality)

NIST SP 800-53

Control mapping for federal systems

HL7 FHIR

Used in Mirth for standardizing claim exchange

PCI DSS

If payment data is involved (e.g., copay processing)

Note: All PII/PHI is handled via encrypted storage, access logging, and role-based access control.

Infrastructure Architecture

Built on AWS with high availability, fault tolerance, and scalability.

Regional Deployment:

  • Primary Region: us-east-1 (multi-AZ setup)
  • Backup/DR Option: Optional replication to us-west-2 (if required)
Network Design:
  • VPC Topology:
    1. Public Subnets: For Mirth (APIs), NAT Gateways, and monitoring.
    2. Private Subnets: For Kafka, RDS, Redshift, OpenSearch, Lambda, SQS.
  • VPC Peering / Transit Gateway:

 For cross-region communication (if applicable).

  • PrivateLink / VPC Endpoints:

 Secure access to AWS services without public internet.

Storage & Data Flow:

  • Inbound Claims: Received via Mirth (HL7/FHIR/JSON), validated, normalized, and routed.
  • COBOL Side: Audit Programs write change records to EFS-backed staging zone (shared file system).
  • Capture & Queue: Relativity Driver reads EFS files → parses → sends to SQS (buffered, retry-safe).
  • Stream Processing: SQS triggers Lambda → publishes ordered events to Kafka.
    1. Kafka Topic:transaction
    2. Partitioning Strategy: Based on transaction ID or sequence number to preserve order.
    3. Retention: 7 days (configurable for replay/backfill).
  • Destination Fan-Out:
    1. RDS:Kafka Connect or Lambda consumer performs idempotent upserts.
    2. Redshift:Batch load via Kafka Connect (or Lambda → S3 → Redshift).
    3. OpenSearch:Real-time indexing via Lambda or Kafka Connect sink connector.

Operational Resilience Patterns:

  • SQS Dead-Letter Queues (DLQ): Failed messages stored for manual inspection/reprocessing.
  • Kafka Replay: Ability to reprocess historical events safely (used during failover or schema updates).
  • Backpressure Handling: SQS throttling limits prevent overload; auto-scaling Lambda functions manage spikes.
  • Auto-Scaling: Kafka brokers scale based on throughput; RDS instances auto-scale read replicas.

Scope and Modules Delivered

  • External pharmacy claims integration via Mirth (validation, normalization, routing, reliability patterns)
  • COBOL change capture using Audit Programs with an EFS staging/landing zone
  • Relativity Driver and SQS orchestration for decoupled processing, retries, and backpressure handling
  • Kafka streaming backbone (topic design, partitioning for ordering, retention and replay strategy)
  • Multi-destination delivery: Application AWS RDS, OpenSearch indexes, and Redshift analytical model
  • Configuration and Administration controls for routing/index rules and operational tuning
  • CI/CD pipelines (GitHub Actions, AWS CodeBuild) and operational monitoring (lag, latency, failures, DLQ)

Architecture Highlights

  • End-to-end PBM claims flow: Pharmacy -> Mirth -> COBOL -> AWS
  • Ordered event processing to preserve claims/financial correctness
  • One streaming source of truth (Kafka) feeding OLTP (RDS), OLAP (Redshift), and Search (OpenSearch)
  • Replayable and auditable pipeline with deterministic reprocessing
  • Resilience by design using EFS staging + SQS buffering + safe retries and DLQ patterns

Key Challenges and Solutions

  • Integrating external pharmacy claims reliably across partners/formats:
    Used Mirth as the canonical ingestion point to validate, normalize, and route claim transactions with consistent observability.
  • Preserving strict ordering from COBOL to cloud destinations:
    Implemented sequence-aware processing and Kafka partitioning strategy so events are applied deterministically in commit order.
  • Near-realtime sync without impacting COBOL performance:
    Decoupled capture using Audit Programs and EFS staging to avoid heavy read pressure on COBOL and smooth spikes.
  • Consistency across RDS, OpenSearch, and Redshift with retries:
    Implemented idempotent upserts and version/sequence checks per destination to prevent duplicates and out-of-order updates.
  • Operational resilience under spikes, partial failures, and throttling:
    Used SQS buffering, DLQ/error topics, and controlled Kafka replay procedures to recover safely while meeting near-realtime goals.

Security and Compliance

Designed with HIPAA-aligned controls: encryption in transit and at rest, restricted access to integration components and data stores, least-privilege IAM, and audit-friendly logs and metrics for traceability and incident response.

Outcomes

Delivered an ordered, near-realtime PBM integration platform ingesting COBOL transactions into RDS, Redshift, and OpenSearch.

  • Enabled Web Portal self-service and search using OpenSearch, and Tableau reporting on Redshift.
  • Improved reliability via EFS staging, SQS decoupling, and Kafka replayability for safe recovery and backfills.

Summary Table

Category

Details

Project Title

Data Integration: RM/COBOL to AWS RDS with Redshift and OpenSearch (Ordered Sync)

Industry

Healthcare – Pharmacy Benefit Management (PBM)

Role

Architect & Tech Lead

Duration

1.5 years (Aug 2025 completion)

Core Goal

Near-realtime, strictly ordered sync to RDS, Redshift, OpenSearch

Key Outcome

Enabled self-service portals, real-time search, and analytics without impacting COBOL

Compliance

HIPAA, SOC 2, NIST SP 800-53

Data Volume

High-throughput, high-accuracy claims pipeline (estimated 10k–50k events/hour)

Latency Target

< 90 seconds from COBOL commit to final destination

Skills

  • Mirth Connect
  • Confluent Kafka
  • Kafka Connect
  • AWS RDS
  • Amazon Redshift
  • Amazon OpenSearch
  • AWS Lambda
  • Amazon EKS
  • Python
  • Amazon SQS
  • Amazon EFS
  • Data Integration
  • Streaming Architecture
  • Ordered Event Processing
  • CI/CD
  • HIPAA-aligned Security
  • Healthcare PBM

Final Thoughts

This project exemplifies modernization through integration, not replacement. It preserved the stability of a mission-critical COBOL system while unlocking the power of modern cloud technologies — all within a secure, auditable, and scalable framework.

Architecture Diagram