Job Openings Kafka Architect

About the job Kafka Architect

Job Description: Kafka Architect (Platform & Developer Focus)

Location: REMOTE

Position Overview: We are seeking a highly experienced Kafka Architect with 8-10+ years of expertise in Kafka administration. This is a strategic role, not for a developer or admin, but for someone with deep architectural knowledge of Kafka and a strong ability to communicate complex concepts effectively. The ideal candidate will have extensive experience in designing, documenting, and optimizing both platform and developer elements of Kafka, with a focus on messaging and streaming capabilities. The candidate should be well-versed in implementing Kafka in diverse environments, with the ability to generalize concepts across different Kafka implementations (including Confluent and others).

Key Responsibilities:

Platform Focus: Infrastructure, Deployment, Operations, and Security

  • Cluster Sizing & Capacity Planning:

    • Define brokers per cluster, partition count, and replication factor based on data volume, retention policies, and throughput needs.
    • Design scaling configurations (horizontal and vertical) for Kafka clusters.
    • Establish optimal network configurations to ensure low-latency performance.
    • Define Zookeeper configurations (if not using KRaft mode) or alternative metadata management.
  • Storage Considerations:

    • Provide recommendations for optimal disk configurations, with a focus on NVMe SSDs.
  • High Availability & Fault Tolerance:

    • Define optimal replication factor (RF) strategies for both production and non-production environments.
    • Design broker failover strategies and leader election mechanisms.
    • Develop multi-region and multi-AZ deployment strategies for Kafka.
  • Security, Audit & Compliance:

    • Implement and recommend Kafka authentication strategies (SASL, Kerberos, OAuth, TLS).
    • Design authorization mechanisms (ACLs, RBAC) for Kafka.
    • Advise on key management strategies, key rotation, and secure storage.
    • Define auditing best practices for tracking access and changes to Kafka resources.
    • Ensure encryption for both data in-transit (TLS) and at-rest (disk encryption).
    • Advise on compliance frameworks (e.g., SOX, CCPA) and ensure Kafka adheres to necessary standards.
  • Monitoring & Observability:

    • Advise on metrics collection using tools like Prometheus, Grafana, and Confluent Control Center.
    • Implement security monitoring tools to detect and respond to real-time threats.
    • Provide recommendations for monitoring disk usage and log aggregation (e.g., Elasticsearch, Kibana, Splunk).
    • Implement lag monitoring strategies using tools like Burrow or Kafka UI.

Developer Focus:

  • Data Retention & Cleanup:

    • Define log segment configurations and cleanup policies (delete vs. compact).
    • Provide recommendations for Kafka compaction processes and scheduling.
    • Advise on time-based vs. size-based retention policies to optimize resource usage.
  • Disaster Recovery & Backup:

    • Define strategies for cross-cluster replication and cluster linking.
    • Set recovery point objectives (RPO) and recovery time objectives (RTO).
    • Define automated backup verification and recovery procedures.
    • Develop Kafka backup strategies, including configuration and topic-level backups.
  • Cost Optimization:

    • Recommend strategies for optimizing consumer group performance.
    • Define dynamic partition rebalancing and scaling strategies.
    • Recommend optimal data retention policies and efficient data compression formats.

Qualifications:

  • 8-10+ years of experience in Kafka architecture and administration (platform-focused).
  • Strong knowledge of Kafka internals, cluster design, security, and monitoring tools.
  • Proven ability to document and communicate complex Kafka platform architectures.
  • Deep understanding of Kafkas role in both messaging and streaming use cases.
  • Experience with different Kafka implementations (e.g., Confluent, Apache Kafka, others).
  • Strong knowledge of distributed systems, scaling, and capacity planning.
  • Familiarity with disaster recovery strategies and backup procedures.
  • Experience with security best practices, including authentication, encryption, and access control.
  • Understanding of compliance regulations and how to implement them in Kafka.
  • Familiarity with metrics collection, monitoring, and observability tools (e.g., Prometheus, Grafana, Burrow).
  • Excellent communication skills, both written and verbal, with the ability to engage with multiple stakeholders.

Preferred Skills:

  • Expertise in cloud-native Kafka implementations and multi-cloud architectures.
  • Experience with KRaft mode or alternative metadata management strategies.
  • Familiarity with automation and orchestration tools in Kafka environments.