Lekshmi Kolappan - Site Reliability Engineer

About Me

Hi, I'm Lekshmi Kolappan — a seasoned SRE/DevOps Engineer with over a decade of experience building reliable, scalable infrastructure. I specialize in AWS, Kubernetes, Docker, and Linux, with expertise in automation and observability.

At Extreme Reach Limited, I drive observability and performance monitoring initiatives. Previously at ADP Inc., I led DevOps transformations that cut release times by 20% and built enterprise-grade infrastructure frameworks.

Beyond tech, I'm passionate about photography, books, and sharing insights through my tech blog. I thrive at the intersection of systems thinking and creative problem-solving.

Technical Skills

Containers and orchestration

Docker Docker Swarm Kubernetes EKS

Scripting

Bash Python

Logging and monitoring

ELK Datadog Prometheus Grafana

Continuous integration/ Continuous Delivery

Jenkins ArgoCD CircleCI

Databases

MySQL MariaDB AWS Aurora Postgres

Configuration Management

Ansible Chef Puppet

DevOps Tools

Helm Kustomize Packer Vault

Professional Experience

Lead DevOps Engineer / Site Reliability Engineer

Extreme Reach Limited April 2022 – Present

Architected secure, multi-tier AWS cloud solutions (EC2, S3, RDS, IAM, KMS, ALB, CloudWatch) delivering 99.9% uptime for enterprise applications, while providing 24/7 production troubleshooting, root-cause analysis, and L3 support for the Node.js/TypeScript application stack.
Engineered EKS/AKS Kubernetes cluster and container runtime (CRI) management using Terraform for Infrastructure as Code (IaC), and piloted Istio service mesh on EKS for traffic management, enabling 30% faster, scalable, and reliable deployments across 5+ environments.
Extensively used AWS CDK and Boto3 library and CloudFormation to create the infrastructure as code, also as a library module.
Deployed Chaos Engineering framework on AWS EKS clusters to automate gameday resilience experiments targeting Kubernetes controllers; mentored 6 junior SREs and DevOps engineers (UK, US, India) on chaos-tooling best practices, experiment design, and failure-mode analysis.
Integrated chaos engineering results in a Prometheus/Grafana observability stack via AWS Lambda-driven pipelines, achieving 60% resilience-testing coverage across global teams and dramatically reducing MTTR through enhanced incident-response readiness.
Developed a scalable Machine Image Creation framework and automated GitHub Actions pipeline template using HashiCorp Packer to build, test, and register production-ready AWS AMIs.
Designed and implemented a multi-cloud Infrastructure as Code (IaC) drift detection framework using GitHub Actions, monitoring services, and PagerDuty to track Azure ARM template configurations and AWS CloudFormation, decreasing drift incidents by 70% ensuring the integrity and consistency of infrastructure definitions.
Integrated Prometheus, Grafana, and Datadog into automated monitoring pipelines with APM span traces and serverless alerting workflows built on AWS Lambda + EventBridge — cutting MTTR 35% across 10+ microservices in 8 prod environments.
Engineered a comprehensive sanity and availability testing framework for a Node.js application with automated end-to-end health checks and uptime monitoring, and integrated SonarQube for static code analysis and quality gating, ensuring 99.9% reliability.
Architected and rolled out an ArgoCD-based GitOps platform using the Apps-of-Apps pattern and a single central ArgoCD instance managing multiple tenant clusters (EKS and AKS); implemented ApplicationSet generators, sync waves, automated rollbacks, and custom health checks.
Engineered enterprise-grade Open Policy Agent (OPA) + Gatekeeper policy-as-code framework using Rego to enforce resilience, security, and compliance standards (e.g., Pod Security Standards, replica/high-availability requirements, liveness/readiness probes, anti-affinity rules, and image trust policies).

Senior DevOps Engineer

ADP Inc December 2017 - December 2021

Leveraged Kubernetes via EKS and other AWS SaaS tools to cut release times by 20%
Modularized AWS IAM as a module via terraform and leveraged the module for authentication with the KIAM server on EKS Kubernetes
Assisted with refining the SDLC to fit system needs using Jenkins CI/CD
Developed an infrastructure audit and automated testing framework for integration with AWS cloud infrastructure leveraging the Chef's Inspec
Perform as a DevOps enabler, specializing in Agile and IaC, Orchestration, Monitoring, and Alerting

Systems Analyst

NTT Inc December 2015 - December 2017

Assisted the transition of a monolith Node Application into the MicroServices world
Implemented the container orchestration using Docker swarm with minimized container maintenance and self-healing capability by 50%
Migrated the public used Docker registry to self-hosted DTR for better security
Accomplished solution-based approach to software development using unified reuse artifacts by Jfrog Artifactory

Linux Administrator

Comodo Inc May 2013 - December 2015

Set up and managed 40 Linux servers with 99.95% up-time
Centralized the configuration management leveraging Ansible for 100+ servers
Automated web server content deployments via Ansible playbooks
Deployed 2+ software iterations per day for 2 years that Increased customer satisfaction by 25%

Education

Bachelor of Engineering

Anna University 2008

During my time at college I learnt Electronics and communication engineering.

Website Statistics

Loading...

Total Visitors

99.9%

Uptime

<100ms

Response Time