Arjun_MK

DeepSeek Unleashed – Part 1: From Bedrock to Control

Arjun Manoj Kumar — Mon, 28 Apr 2025 06:11:30 GMT

By Arjun Manoj Kumar K, DevOps Engineer | AI Enthusiast

“November 30^th 2022. ChatGPT dropped. And everything changed.”

LLMs went from research labs to everyday tools — fast.
But spinning one up yourself? Still feels... a little rough.

In this two-part blog, I’ll walk you through how to run DeepSeek, one of the most capable open-source LLMs today — first on AWS Bedrock, then on your own machine with Ollama.

Before we dive into terminals and code, we'll make sense of a few key AI building blocks:
Inference, weights, parameters, tokens — and even a peek at what exactly is this new architecture called MAMBA?

Don’t worry, I’ve got you.
This isn’t a lecture — it’s a dev-to-dev download.

Grab your coffee. Let’s boot up the cloud.

LLM 101: Inference, Weights, and Transformers (Oh My!)

Before we get our hands dirty with DeepSeek, let’s cover a few basics of large language models (LLMs) in plain English:

Parameters and Weights: You can think of parameters as the internal configuration or “knowledge” of an LLM. These are often referred to as weights – essentially giant tables of numbers that the model has learned during training. Modern LLMs have lots of these; for example, DeepSeek has 671 billion parameters (The DeepSeek variant we'll use, for example, packs about 8 billion parameters!). The more parameters, generally, the more knowledge or nuance a model can have – but it also means more computing power needed to run it. When someone says “a 13B model,” they mean 13 billion parameters. These weights are what get loaded into memory when you “boot up” the model.
Tokens & Tokenization: When you type a sentence into a model like DeepSeek, it doesn’t actually understand words the way we do. Instead, it breaks everything down into tokens, which are like bite-sized pieces of text. A token might be a full word ("hello"), part of a word ("tion"), or even just punctuation ("!"). This process is called tokenization, and it’s the model’s first step in trying to make sense of language.

For example, the sentence:

"Transformers are cool!"

might get split into tokens like:

["Transform", "ers", " are", " cool", "!"]

Embeddings: Once text is tokenized, the model still can’t process it as-is — it needs numbers. That’s where embeddings come in. An embedding is basically a vector representation of a token — a list of numbers that captures its meaning based on context. It’s like giving the model a way to understand that:

"dog" and "puppy" are similar,
but "dog" and "banana"... not so much.

Inference (vs. Training): Training an LLM is the heavyweight process of updating those billions of weights by showing the model tons of examples (this is done by the model’s creators and can take weeks on supercomputers). Inference, on the other hand, is what we do as users: it’s when we feed input to a pre-trained model and get an output (a prediction). Inference is basically using the model to generate text. For instance, asking DeepSeek a question and getting an answer is an inference. It’s much less intensive than training, but for big models it can still be slow or require powerful hardware. In short, inference = running the model forward to get results (no learning happening at that time).
Transformer Architecture: Most state-of-the-art LLMs today (including DeepSeek) are based on something called the Transformer architecture. Without diving too deep, the Transformer is a type of neural network that introduced a magic ingredient known as “self-attention.” Attention mechanisms let the model weigh the importance of different words in the input when generating each word of the output. Imagine you’re writing a reply to someone – you “pay attention” to the relevant parts of what they said. Transformers do this at scale and in multiple layers. This architecture was revolutionary because it allowed models to handle very long text inputs and outputs far more effectively than older recurrent neural networks. If you hear about “GPT-style” models, that basically means a Transformer-based model. So when we use DeepSeek, under the hood there’s a Transformer network doing the heavy lifting: reading the input text, figuring out which bits matter, and then producing a coherent response one token at a time. Neat, huh?

Now, keep in mind that the AI field moves fast. Transformers were bleeding edge for a while, but researchers are exploring new architectures to address some of Transformers’ limitations, like the heavy compute cost for very long inputs. One of those contenders is MAMBA, which we’ll touch upon now.

Beyond Transformers: Meet MAMBA, the New Kid on the Block

You might have heard whispers about architectures beyond Transformers. MAMBA is one such new approach that’s generating buzz. So, what is MAMBA?

In a nutshell, MAMBA is a new LLM architecture that integrates something called the Structured State Space (S4) model to handle lengthy sequences of data. The S4 technique borrows ideas from older sequence models (like RNNs and signal processing) to efficiently capture long-range dependencies. This means MAMBA-based models aim to manage very long context lengths with better efficiency than transformers, which tend to get slow or memory-hungry as context grows. In fact, MAMBA tries to combine the best of various approaches – recurrent models, convolutional models, etc. – to simulate long-term dependencies in text.

What does that mean for you? Potentially, future LLMs may be a future DeepSeek version? could handle book-sized inputs or huge documents much faster using architectures like MAMBA. For example, where a transformer might choke or take forever on a thousands-of-words prompt, a MAMBA model could breeze through with linear scalability (no exponential slowdowns as the text grows). This is still an area of active research, but it’s exciting because it shows how the landscape is evolving.

For this blog, we’ll be working with DeepSeek, which (as of the R1 version) is still a Transformer-based model. But it’s cool to know what’s on the horizon. If nothing else, you can drop “structured state-space models” in your next friends chat to sound a bit extra intelligent. 😄 The key takeaway: Transformer models are the current standard, but architectures like MAMBA hint at a future where LLMs get even faster and handle more data with ease.

With the groundwork set, let's dive into DeepSeek!

What Is DeepSeek?

DeepSeek-R1 is an open-source large language model that burst onto the scene in late 2024. It was in the spotlight for its strong reasoning abilities in areas like math and coding. In fact, DeepSeek has demonstrated performance on par with some of OpenAI’s models, at a fraction of the cost or resource requirements. For Eg, it reportedly scored about 79.3% on the AIME 2024 math competition and did well on a software engineering benchmark– these are tough tests, so those numbers turned heads. The model was developed by a team with the goal of pushing reasoning capabilities in an open manner. It’s released under the MIT license.
Being open-source has a couple of big implications. First, we can actually look under the hood – the architecture, training data, and weights are accessible, not a proprietary secret. That allows the community to understand its strengths and limitations, and even improve it. Second, we’re free to deploy DeepSeek on our own infrastructure. No dependency on a specific cloud provider or API – you could run it in AWS (as we will soon), on your own servers, or even on a beefy laptop. This freedom lets organizations avoid vendor lock-in and gives developers like us a playground to experiment without hefty paywalls.
DeepSeek comes in a few flavors. The main DeepSeek-R1 model is a large one (671 billion Parameters, similar scale to GPT-3 or beyond). Recognizing not everyone has the means to run a huge model, the creators also provided distilled, smaller versions – essentially compressed models that retain a lot of the capabilities of the big one, but with far fewer parameters. In this guide, we’ll use deepseek-r1:8B (an 8-billion parameter variant distilled from the original). It’s much easier to host and experiment with 8B than, say, 34B or more. Think of it as DeepSeek’s little sibling that hasn’t been to the gym – smaller in size, still pretty strong.

Alright, now that we know who DeepSeek is and have some context, let’s actually run this thing! We’ll start with the cloud route: AWS Bedrock.

Running DeepSeek on AWS Bedrock

Let’s go step-by-step through getting DeepSeek running on Bedrock.

Accessing DeepSeek on Bedrock

First, you’ll need an AWS account with access to Bedrock. In the AWS Management Console, navigate to Amazon Bedrock. Under Bedrock configurations, find Model access in the sidebar. Here, you’ll see a list of available models from various providers (Amazon’s own models, Anthropic, Cohere, Stability, and many others – Bedrock is a bit like a model marketplace). Look for the DeepSeek section. You should see DeepSeek-R1 listed as a model under that category.

Screenshot: The AWS Bedrock console’s Model Access panel, showing DeepSeek-R1 in the list of base models (with access granted).

In my case, after I clicked “Request access” for DeepSeek-R1 and got the green light, I was ready to use the model. You might only have to do this once. Now the DeepSeek model is enabled for your account. Time to test it out!

Testing DeepSeek in the Bedrock Playground

AWS Bedrock provides a handy UI called the Playground, where you can interact with models directly in the browser. On the left menu, click Playgrounds, then choose Chat/Text (since DeepSeek is a text generation model). At the top of the Playground interface, there’s a drop-down to select a model. Click that, and you’ll see categories for each provider. Choose DeepSeek, and then select DeepSeek-R1 as the model. DeepSeek supports an up to 128k context window on Bedrock . Hit Apply to confirm the model selection.

Screenshot: Selecting DeepSeek-R1 in the Bedrock Playground. Under “Model providers” you can see DeepSeek (alongside Amazon, Anthropic, etc.), and we’ve chosen the DeepSeek-R1 model.

With the model selected, you can enter a prompt in the text box and hit send. For example, try something simple like: “Hello DeepSeek, can you solve 2+2?”. In a few seconds, you should get a response from the model right there in the browser. (DeepSeek might return something like “4” or a brief explanation – it’s good at complex math reasoning also.). This playground is nice for quick experiments. I gave it a more complex math word problem from a textbook and it actually wrote out the reasoning steps and solution, which was impressive to see live.

So the Playground confirms everything is working. But the real power is using DeepSeek in your applications via code. Let’s see how to do that.

Challenges & Workarounds: When I first started, I did the CLI approach. The gotcha was that the CLI returns the result in a bytestream, which isn’t directly printed to the terminal. That’s why I had to output to a file and then check the file. It felt a bit roundabout – copying JSON strings and opening files in VS Code just to see the answer. Switching to a simple Python script with boto3 made things easier since I could see the output directly in my VS code output. So, if you’re iterating as a developer, I’d recommend using boto3 or an SDK instead of the CLI.

Another thing I encountered was apart from bedrock or deepseek, I was running sso role, so initially hit a few blocks there.

Invoking DeepSeek via Python (with Boto3)

Once you’ve got access to AWS Bedrock, there are two ways to interact with DeepSeek:

InvokeModel – for simple, one-shot prompts (like asking a question, getting an answer).
Converse – for multi-turn chat-like conversations where the model remembers previous context.

Even though DeepSeek is tuned for dialogue, you can use either. If you're building a chatbot, Converse feels more natural. But if you just want a quick answer or run one-off prompts from your script or terminal, InvokeModel works great too.

Using Python (Boto3 SDK)

Let’s be real — CLI gets messy fast. For better flexibility and cleaner code, you can use Python with Boto3. Here's a full working script I ran myself:

https://github.com/mkarjun/AWSDeepSeekTest/blob/main/DS.py

Tuning and Guardrails

Bedrock also comes with Guardrails – basically content filters and controls you can enable for the model. In the Bedrock console, you can define rules to block certain kinds of content. For example, you could create a guardrail to filter out any output that looks like sensitive info or contains specific keywords (like “politics” or others). This is really useful if you’re deploying an app in production and want to ensure the model doesn’t say something out-of-line or reveal confidential data. DeepSeek being fully managed on Bedrock means it can take advantage of these enterprise features (monitoring, logging, access control, etc.) that AWS provides.

Alright – we’ve successfully run DeepSeek in the cloud! We saw that it can be done through a nice UI or via code, and noted some tricks (like prompt formatting and using the SDK). But what if you want to run DeepSeek offline, on your own hardware*? Maybe you don’t want to rely on an internet service, or you want to experiment with the model locally for free. That’s where* Ollama comes in.

Conclusion – Part 1: From Spinning Up to Taking Control

So far, we’ve tamed the beast in the cloud. DeepSeek runs cleanly through Bedrock — efficient, powerful, wrapped in AWS polish. You typed. It answered. Simple.

But let me tell you something:
This... was just the beginning.

What happens when you unplug from the cloud? When you go completely local — no safety nets, no billing dashboards, no external dependencies. Just you… and the raw weight of an 8 billion parameter model pulsing in your machine’s RAM.

In Part 2, we ditch the cloud and go full rogue. We’ll drag DeepSeek down from the heavens and run it where it wasn’t designed to thrive — in the trenches of your laptop, powered by coffee and curiosity.

The revolution won't be deployed — it’ll be downloaded.

See you in Part 2.

DeepSeek Unleashed – Part 2: The Core

Arjun Manoj Kumar — Sun, 27 Apr 2025 18:30:00 GMT

In Part 1, we saw DeepSeek in the cloud. Bedrock handled the weight, scaled the compute, and gave us a sleek, managed interface to explore the model’s power — all buttoned up in AWS-grade polish.
But now, it’s time to pull the plug — literally.
No consoles. No surprise costs. No “your session will start shortly.”
Just you, a terminal, and a language model quietly sitting in your machine, waiting to be unleashed.

This isn’t just fun — it’s freedom.

In this part, we’re going fully local. We’ll run DeepSeek using a tool called Ollama, turning your laptop (or server) into a private LLM playground. Think of it like strapping a jet engine to your dev environment — minus the cloud bill.
And here's the kicker: this isn't limited to laptops. The same setup can be extended to EC2, ECS, or any on-prem rig, giving you a fully private, production-ready LLM — one that you own, control, and can lock behind your own firewalls.
If the cloud was convenience, this is sovereignty. Let’s light it up !.

Running DeepSeek Locally with Ollama

Now for the DIY enthusiasts: running an LLM on your own machine. This can be intimidating because of the hardware requirements, but it’s getting easier every day thanks to projects like Ollama. Ollama is an open-source tool that makes it super simple to download and run large language models on your computer. It handles a lot of the heavy lifting (like managing model files, running a server, and even providing a web UI). Ollama can act as your personal “AI engine” – kind of like having a mini AI, but on your laptop!

Here’s how I got DeepSeek working locally with Ollama:

1. Setting Up Ollama

Install Ollama: Ollama supports MacOS and Linux natively, and Windows via WSL or a preview build. On Mac, the easiest way is Homebrew: brew install ollama. On Linux, you can use Docker (or there are .deb packages for some distros). I went the Docker route, which was as simple as running:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

This pulls the latest Ollama Docker image, runs it in the background (-d), and opens port 11434 which Ollama uses to serve its API/UI. We also mount a volume for model data (-v ollama:/root/.ollama) so that models you download are persisted on your disk (so you don’t have to re-download every time you restart the container). If you have a GPU and want to use it, add --gpus=all to that command – Ollama can then utilize your NVIDIA GPU to speed up inference. I didn’t have a beefy GPU on my machine, so I ran on CPU which works but obviously slower.

Ollama also has a simple web interface: if you visit http://localhost:11434 in your browser, you should see a minimal UI where you can select and chat with models. (When no model is loaded, it’ll just be empty or show an example to install one.)

2. Obtaining the DeepSeek Model Weights

Here’s the part where many people get stuck: you need the actual model files to run an LLM locally. These files can be large. The full DeepSeek-R1 model is huge (dozens of GB), so as mentioned we’ll use the 8B distilled version. Even 8B can be multi-gigabyte, but it’s manageable.

Quantization: To make models feasible to run on typical hardware, people often use quantized versions. Quantization means reducing the precision of the model’s numbers (e.g., from 16-bit floats to 4-bit integers), dramatically shrinking the size with only minor loss in accuracy. There’s a popular format called GGUF (a variant of the GGML format) which is great for local models. In fact, Ollama works natively with GGUF models. Many community model providers (like TheBloke on Hugging Face) release LLMs in GGUF format for easy local usage. I looked for DeepSeek-R1 in GGUF form and found one. You might find it on Hugging Face under a repository like unsloth/DeepSeek-R1-GGUF or similar.

Tip: If you have ollama CLI installed, you could also try ollama pull deepseek (if someone has published a model with that name in the Ollama library). At the time of writing, DeepSeek isn’t (yet) one of the default library models, so we do a manual import.

For Deepseek Ollama checkout https://ollama.com/library/deepseek-r1

3. Running Inference Locally with Ollama

Now for the exciting part: actually chatting with DeepSeek locally! To start using the model, you simply run:

ollama run deepseek-r1

This will initiate the model and drop you into an interactive prompt (a bit like a REPL or a chat interface in your terminal). The first time it loads the model, you’ll see it allocate memory and it might take a few seconds to warm up those billions of parameters. Once it’s ready, you’ll get a prompt where you can type. Try an example: “Hello, what is the capital of France?” and hit enter. DeepSeek (running entirely on your machine now) will think for a moment and then hopefully respond with “The capital of France is Paris.” – or perhaps a slightly longer explanation, depending on how it was configured to respond. Usually if we create a modal on our own at this first test you might find modern art-like sentences which I do not understand much

I was giddy the first time I saw a serious LLM generating text on my humble machine [16 GB RAM without GPU] without any cloud services involved. It truly feels like having a pet genie in your computer.

You can have a multi-turn conversation by continuing to type prompts. Ollama will by default handle the conversation context for you, up to whatever the model’s context length is (often these GGUF quantized models support 2048 or 4096 tokens context by default, though some might go higher). Keep in mind, all the computation is happening on your hardware now, so don’t expect lightning speed unless you’ve got a monster rig. On my test with an 7B model on CPU, simple questions took a few minutes, while complex requests might take more for a few hundred tokens of answer. If you have a decent GPU and enabled it, times will be much better.

System Requirements & Limitations: What kind of system do you need to do this comfortably? There’s no one-size answer – it depends on model size and quantization. Generally, the more RAM the better. For an 8B parameter model, if it’s 4-bit quantized, you might only need ~4-8 GB of RAM free. My HP Pavilion with 16GB RAM could handle it, but it was using a big chunk of memory. If you try a 30B model, you’d likely need 16GB+ RAM and a lot of patience (or ideally a GPU with at least 15-20 GB VRAM if running 4-bit). Always check the model documentation; many community sharers will note recommended RAM. As a rule, start with smaller models (7B, 13B) and work your way up. It’s amazing what even a 7B model can do nowadays.

it’s incredibly empowering to run LLMs locally. No API costs, no rate limits, and you can even hack the model if you’re into that (for example, applying fine-tuning or feeding it custom data). But that’s a story for another day!

Cloud vs Local – Which to Choose?

We’ve successfully run DeepSeek on AWS Bedrock and our local machine via Ollama. Give yourself a high five!! That was a lot to cover. By now, you’ve seen that there are trade-offs between the two approaches:

AWS Bedrock (Cloud): Huge advantages in ease of use. You get scalability, power (use large versions of the model with long context lengths and strong performance), and managed security (data stays under your control with AWS’s policies, plus guardrails to filter outputs). The downside is cost – you pay per request/token, so heavy usage can rack up a bill. Also, you need an internet connection and for some folks, dealing with AWS setup might be a bit of a learning curve (though we hope our steps made it clearer).
Ollama (Local): You have full control and privacy (your data never leaves your machine). It’s essentially free after the upfront hardware investment. It’s great for development, prototyping, or just tinkering with AI for learning. You can even use it offline. The challenges are the hardware limits – you can’t easily run gigantic models if your computer can’t handle them. Inference will be slower than on cloud GPUs. And sometimes you have to jump through a hoop (like converting models to the right format) – though projects like Ollama are smoothing that out rapidly, Also you can design an private architecture using Ollama in EC2 or container environments.

Conclusion – Part 2: The Model in the Machine

So there it is. You’ve come full circle — from invoking DeepSeek through a cloud API to summoning it straight from your SSD.
No more waiting for tokens to drip in from the internet.
No more wondering if you’ve hit your rate limit. Just your local compute, spinning up billions of neurons at your command.
It’s not perfect — sure, it might be slower, and your laptop fan may now sound like a mini jet engine. But it’s yours. And it’s powerful.
This opens up new ways to build — offline-first apps, privacy-heavy experiments, or even fine-tuned niche models stitched together into your own Frankenstein stack.

The cloud was Act One. Local is the twist. And what comes next? That’s entirely up to you. !

In reality, you’ll probably use both setups depending on the need: Bedrock for fast, production-grade delivery and Ollama (or EC2-based local deployments) when you want full control, offline capability, or air-gapped privacy.

If you’re building something more secure, spinning up an on-prem rig or EC2 instance with DeepSeek gives you a private LLM pipeline with zero data leaving your infrastructure.

It’s like having your own Jarvis — only quieter, and hopefully not plotting your downfall.

“As for me? I keep wondering — do we really need 600B+ parameter models, when we can run smaller, purpose-driven ones and stitch them together like modular AI legos?

Might sound simpleton, but that’s where my head’s at.”

P.S. A huge shoutout to the minds who’ve made my diving into LLMs a whole lot more accessible and inspiring — special thanks to Chip Huyen for her clear writing and practical perspectives, and to Andrej Karpathy for always making the complex feel intuitive.

Thanks for reading! Got stuck or curious? Ping me — happy to help.
Now if you’ll excuse me, my laptop's trying to explain quantum physics to my coffee mug.

The Future of Platform Engineering: 2025

Arjun Manoj Kumar — Fri, 20 Dec 2024 07:21:24 GMT

Platform engineering in 2025 is no longer a niche domain within IT; it is the cornerstone of digital innovation, operational efficiency, and business agility. With the confluence of AI, hybrid computing, and decentralized teams, platform engineering has evolved from building support systems to architecting the very fabric of tomorrow’s enterprise.

This blog explores what platform engineering looks like in 2025, focusing on the trends, tools, and principles shaping the field.

A Paradigm Shift: From Supportive to Strategic

In the past, platform engineering teams were perceived as back-office facilitators, focused on building infrastructure for developers. By 2025, this perception has radically changed. Platform teams now operate as strategic enablers, directly driving business outcomes. This transition is fueled by the growing importance of composable platforms, which enable rapid assembly and reconfiguration of business capabilities.

Consider an energy company that builds digital twins for its operations. The platform engineering team provides a unified digital infrastructure—including APIs, reusable components, and self-service capabilities—to connect IoT data from sensors, AI for predictive analytics, and blockchain for immutable records. The result? Quicker time-to-market and unparalleled agility.

Platform engineering’s mission is no longer limited to enabling developers; it extends to empowering fusion teams—cross-functional units comprising developers, business analysts, and domain experts—to innovate at speed while adhering to architectural principles.

The Core Pillars of Platform Engineering in 2025

1. Self-Service Platforms
The hallmark of platform engineering in 2025 is the self-service model. Platforms are designed to reduce friction for developers and other end-users by abstracting away complexity. These self-service capabilities extend beyond traditional developer environments to include no-code and low-code interfaces for business technologists.

For example, a retailer’s platform engineering team might build a self-service portal that allows marketing teams to deploy personalized recommendation engines without needing to write a single line of code. Underneath this simplicity lies a robust architecture powered by Kubernetes, serverless computing, and AI-driven monitoring tools.

2. AI-Infused Operations Artificial Intelligence plays an indispensable role in 2025. AI augments traditional monitoring and observability tools, enabling proactive incident resolution, resource optimization, and even predictive scalability. Tools like AIOps platforms monitor patterns, identify anomalies, and autonomously implement fixes.

Additionally, AI governance platforms ensure that autonomous systems are ethical, transparent, and aligned with organizational values—a critical need in an era where AI touches every facet of business operations.

3. Composable Architecture Platform teams adopt a composable approach, where platforms are modular and can be assembled like Lego blocks to suit varying needs. In a composable world, businesses can adapt rapidly to market changes by recombining modular platform services such as APIs, machine learning pipelines, and infrastructure components.

For instance, a financial services company can use its composable platform to quickly launch a new lending product by reusing components for user authentication, credit scoring, and payment processing.

4. DevSecOps by Design In 2025, security is deeply embedded into every stage of the software delivery lifecycle. Platform engineers leverage tools that integrate security policies as code, enforce zero-trust architectures, and provide automated vulnerability scanning.

Fusion teams, armed with secure-by-default platforms, can focus on innovation without worrying about compliance or cyber risks. Security becomes an enabler rather than a bottleneck.

5. Multi-Cloud and Hybrid Computing Platform engineering has fully embraced the multi-cloud and hybrid computing paradigms. Enterprises leverage cloud-agnostic platforms to achieve cost efficiency, resilience, and scalability. Infrastructure as Code (IaC) tools are augmented with AI to enable seamless provisioning, management, and scaling across diverse environments.

A hybrid computing approach also allows organizations to run sensitive workloads on-premises while leveraging the cloud for AI training or high-performance analytics.

Emerging Trends Shaping Platform Engineering

1. Post-Quantum Cryptography With quantum computing advancing rapidly, platform engineers are transitioning to post-quantum cryptography. These new cryptographic standards ensure that sensitive data remains secure even in a world where traditional encryption methods are obsolete.

2. Ambient Intelligence Platforms now leverage ambient intelligence—low-cost, pervasive sensors and tags—to gather real-time data from physical environments. This data feeds into the platform’s analytics and automation systems, enabling smarter decision-making. In logistics, ambient intelligence improves supply chain efficiency by tracking inventory conditions and predicting disruptions.

3. Hyperautomation Platforms incorporate hyperautomation capabilities that extend beyond traditional RPA. By integrating machine learning, AI, and process mining, platforms automate complex workflows across business units, saving costs and improving accuracy.

4. Sustainability as a Metric Energy-efficient computing is a critical focus for platform teams. Platforms are built to minimize carbon footprints through efficient algorithms, optimized hardware, and intelligent workload placement. Sustainability KPIs are tracked alongside traditional metrics like performance and uptime.

Architecture Workflow Using a Platform Engineering Approach

Let’s assume a fintech company wants to launch a new digital wallet product. Here is a step-by-step outline of how a platform engineering approach would be applied:

Requirement Gathering:
- Product team defines features like user authentication, fund transfers, transaction history, and AI-driven fraud detection.
- Platform engineering team collaborates with the product team to identify reusable components and APIs.
Designing the Platform:
- The team selects a composable architecture leveraging existing microservices for authentication, payment processing, and notifications.
- Kubernetes is used for container orchestration to ensure scalability.
- Security policies (e.g., OAuth2 for authentication) are embedded into the design.
Building Self-Service Capabilities:
- A developer portal is created, providing APIs, templates, and deployment scripts for the product team to use.
- Low-code interfaces are added for non-technical teams to configure workflows like user onboarding.
Integration and Testing:
- APIs are integrated into the frontend application.
- Automated pipelines test the application for performance, security, and compliance.
Deployment and Monitoring:
- The platform uses a CI/CD pipeline to deploy the wallet to a multi-cloud environment.
- AI-driven observability tools monitor user behavior and system performance, providing real-time insights and alerts.
Continuous Improvement:
- Feedback loops are established to gather data from end-users and developers.
- Updates and new features are rolled out incrementally, leveraging the composable platform’s modularity.

This workflow showcases how platform engineering accelerates product delivery while maintaining security, scalability, and flexibility.

Skills and Tools in the Platform Engineer’s Toolbox

1. Skills

System Design: Expertise in architecting modular, scalable systems.
AI & ML Proficiency: Understanding AI-driven tools and how to integrate them into platforms.
Security: Knowledge of zero-trust principles, cryptography, and compliance.
Collaboration: Ability to work with fusion teams, communicating technical concepts in business terms.

2. Tools

Platform Orchestration: Kubernetes, Terraform, and Ansible.
AI Augmentation: Datadog with machine learning capabilities, OpenAI APIs.
Observability: Grafana, Prometheus, and distributed tracing tools like Jaeger.
Security: HashiCorp Vault, Aqua Security, and post-quantum cryptographic libraries.

Measuring Success: New KPIs for Platform Engineering

In 2025, success metrics for platform engineering go beyond operational indicators like uptime. They include:

Platform Adoption: Percentage of fusion teams using the platform.
Time to Market: Reduction in development cycles for new products.
Self-Service Efficiency: Number of self-service deployments versus IT-assisted ones.
Sustainability Metrics: Energy consumption and carbon footprint of platform operations.
Developer Satisfaction: Feedback scores from fusion teams and developers.

Real-World Impact: Case Studies referred from Gartner paper

1. Cepsa
Cepsa, an energy company, used platform engineering to transition to a decentralized model. Their foundational digital platforms incorporated self-service tools and automated nonfunctional requirements like security and observability. The result? A 67% increase in platform adoption and faster delivery of digital products.

2. ABN AMRO This financial institution modernized its infrastructure by adopting a self-service platform model. Developers could provision resources, deploy microservices, and manage APIs without needing IT support. The platform reduced operational costs and improved customer satisfaction by enabling faster feature rollouts.

The Road Ahead

As we stand on the precipice of 2025, platform engineering is no longer about keeping the lights on. It is about leading the charge toward a future where technology seamlessly aligns with business objectives, where AI and human intelligence coexist symbiotically, and where platforms are not just enablers but accelerators of innovation.

In this new era, the platform engineer is the unsung hero, blending technical mastery with strategic foresight to build systems that power the enterprise of tomorrow. The opportunities are limitless, but the responsibility is immense.

Welcome to the future of platform engineering. Are you ready to build it?

Elevate Your Container Game: Simplifying Container Development with Finch

Arjun Manoj Kumar — Fri, 29 Dec 2023 10:44:27 GMT

Introduction

In this blog, we will introduce Finch, based on a session by Akil Mohan on AWS community day Kochi 2023, a software engineer and maintainer of container D, and discuss Finch, an open-source client for container development. We will explore the challenges faced by developers in container development and how Finch aims to simplify the process. The agenda for this blog includes an overview of container development, the hurdles faced by developers, an introduction to Finch, and a discussion on its architecture and components.

Akil Mohan: A software engineer and maintainer of container D
Finch: An open-source client for container development
Challenges in container development: Limited tools for developers on Windows and macOS, the need for Linux VMs, and network setup
Blog agenda: Overview of container development, challenges faced by developers, introduction to Finch, and discussion on its architecture and components

The Era of Containers

In today's software development landscape, containers have become increasingly prevalent. Containers are used to deploy code and applications, and even development environments. They offer several advantages that make them a popular choice for developers.

One of the main advantages of using containers is the ease of code deployment. Containers provide a consistent and reliable environment for running applications, ensuring that the code works the same way across different systems. This eliminates the "it works on my machine" problem and streamlines the deployment process.

Containers also offer significant advantages for development environments. Developers can create containerized environments that include all the necessary tools and dependencies, making it easy to set up and reproduce development environments across different machines. This eliminates the need for developers to spend time installing and configuring tools and reduces the chances of compatibility issues.

However, there are compatibility challenges when it comes to different operating systems. While containers work seamlessly on Linux, Windows, and macOS, there may be some differences in how they are implemented. For example, macOS has limited support for containers due to the limitations of the macOS kernel, making proper isolation difficult.

In the case of macOS, developers often need to set up Linux virtual machines (VMs) to test and run containers. Setting up these VMs can be a time-consuming and complex process. Developers need to create a Linux VM, install container runtimes like Docker or Podman, set up file and network sharing, and ensure proper networking for testing web applications. All these steps can add significant overhead to the initial setup process.

The initial setup process for containers on macOS can have time and cost implications. Developers may spend hours or even days configuring the Linux VM and the necessary tools for container development. This setup process can delay the start of development work and increase the overall development time.

Overall, containers have revolutionized software development by simplifying code deployment and development environments. While they offer many benefits, developers need to navigate compatibility challenges, particularly when working with different operating systems. The need for Linux VMs in macOS adds an additional layer of complexity to the setup process, resulting in potential time and cost implications.

Introducing Finch: A Solution for Container Development

AWS has developed Finch as a solution to address the challenges faced by their developers in container development. Finch is an open-source client that aims to simplify the container development process and provide a more streamlined experience for developers.

Finch offers several key features that make it a valuable tool for container development. Firstly, it provides a single CLI (Command Line Interface) for interacting with containers. This means that developers can use the Finch CLI for almost 90% of the commands they would typically use with Docker on Linux. This unified CLI makes it easier for developers to work with containers, regardless of the operating system they are using.

Another important feature of Finch is its compatibility with different chip architectures, specifically amd64 and arm64. This means that developers can use Finch to launch and work with containers on both Intel and M series chips. This compatibility ensures that developers have flexibility in their container development, regardless of the chip architecture they are working with.

Finch also integrates with other open-source tools to provide a comprehensive container development experience. It leverages containerd, an industry-standard container runtime, to manage the lifecycle of containers on the host machine. Additionally, Finch utilizes Lima, a tool for launching Linux virtual machines on macOS, to provide the necessary Linux environment for testing and running containers. The integration of these tools ensures that developers have access to the essential components required for container development.

The architecture of Finch consists of multiple components working together to facilitate container development. Containerd is used as the container runtime, handling tasks such as storing container layers, launching containers, and managing their lifecycle. Lima provides the Linux virtual machine on macOS, allowing developers to test and run containers in a Linux environment. Buildkit, another component of Finch, is responsible for building container images. It takes Dockerfiles and converts them into various output formats, including Docker tarballs and Open Container Initiative (OCI) tarballs. Lastly, Finch utilizes the Container Networking Interface (CNI) for setting up networking within containers.

In summary, Finch offers a powerful solution for container development by addressing the challenges faced by developers. With its single CLI, compatibility with different chip architectures, integration with open-source tools, and well-designed architecture, Finch simplifies the container development process and provides a streamlined experience for developers.

Architecture:

Components of Finch

Finch, the open-source client for container development, is comprised of several key components that work together to simplify the container development process. These components include:

Containerd

Containerd is an industry-standard container runtime that manages the lifecycle of containers on the host machine. It handles tasks such as storing container layers, launching containers, and managing their lifecycle. Containerd is widely used in various platforms, including Kubernetes and AWS services like EKS Fargate and Firecracker. Finch leverages Containerd to ensure efficient and reliable container management.

Nerdctl

nerdctl is a command-line interface (CLI) for working with container runtimes that provides compatibility with Docker CLI. It allows developers to interact with Containerd directly, giving them full control over their containers. Nerdctl enables developers on macOS or Windows to use familiar Docker commands and seamlessly integrate with the Finch ecosystem.

Lima

Lima is a tool that provides a Linux virtual machine (VM) on macOS. It addresses the challenge of running Linux containers on macOS by creating a Linux environment for testing and running containers. Lima handles the setup of the Linux VM, including file sharing and network configuration. With Lima, developers can easily test and run Linux-based containers on their Mac machines.

Buildkit

Buildkit is a powerful tool used for building container images. It takes Dockerfiles and converts them into various output formats, such as Docker tarballs and Open Container Initiative (OCI) tarballs. Buildkit offers advanced features and optimizations for faster and more efficient container image builds. Finch utilizes Buildkit to streamline the image building process and improve developer productivity.

Installation and Getting Started

To get started with Finch, you can install it using package managers like Homebrew on macOS. Once installed, you need to initialize the Finch VM using the "fin vm init" command. This will start the Linux virtual machine within your macOS environment. From there, you can use the Finch CLI to interact with containers, just like you would with Docker CLI on Linux.

With Finch, you can easily pull container images, run containers, and leverage the various components seamlessly. Finch provides a unified and efficient experience for container development, regardless of the operating system or chip architecture you're using.

Conclusion and FAQ

In conclusion, Finch is an open-source client that aims to simplify container development by addressing the challenges faced by developers. It provides a single CLI for interacting with containers, making it easier for developers working on different operating systems. With its compatibility with different chip architectures and integration with industry-standard tools like containerd and Lima, Finch offers a streamlined and efficient container development experience.

Key points discussed in this blog include:

The advantages of using containers for code deployment and development environments
The challenges faced by developers, especially on Windows and macOS
An introduction to Finch as a solution for container development
The components and architecture of Finch, including containerd, nerdctl, Lima, and buildkit

Frequently Asked Questions:

Is Finch available for use?

Yes, Finch is available as an open-source tool that can be installed using package managers like Homebrew on macOS.

How can I contribute or get further assistance?

If you have any inquiries or want to contribute to Finch, you can contact Akhil Mohan or Arjun Manoj Kumar on platforms like LinkedIn or Twitter. They will be available to provide assistance and guidance related to open-source projects and contributions.