At RisingStack, we are highly interested in building scalable and resilient software architectures. We know that a lot of our readers share our enthusiasm, and that they want to learn more about the subject too.
To expand our blogging & training initiatives, we decided to launch a new series called Top of the Stack which focuses on architecture design, development trends & best practices for creating scalable applications.
In the first episode of Top of the Stack, we interviewed Patrick Kua, the CTO of N26, a successful banking startup from Germany. Patrick is a microservices expert who spent 13 years at ThoughtWorks, and then decided to lead the tech team for a modern banking company which already serves more than 500.000 users.
During our ~30 mins long conversation we discussed a wide range of topics, with the intention of understanding how Patrick's team choose the languages and frameworks they use, how they handle testing, DevOps & continuous integration and how they develop Microservices.
The conversation is available both in an audio & written format. For the transcript, move on!
To help you navigate a little, we list the topics we cover with the anchors you can use:
- How did you join N26?
- What was it like to work with Martin Fowler?
- What languages do you use at N26?
- Why did you ditch Scala for Kotlin?
- What databases do you prefer at N26?
- What communication protocols do you use with Microservices?
- How do you handle testing?
- What kind of deployment strategies do you have?
- Let's discuss automation & continous integration.
- Did you face any scaling problems?
- How do you prevent errors from cascading between your services?
- How do you handle caching & ensure idempotency?
- What technologies are you looking forward to in 2018?
- How did you convince your team to use the Chaos Monkey?
- Ideal microservices size, throwaway Microservices, GDPR in EU Law.
Patrick Kua Interview Transcript:
Welcome everybody on the Top of the Stack Podcast by RisingStack, where we are talking about services and infrastructures that developers build. I'm Csaba Balogh your host, sitting with our co-host Tamas Kadlecsik, the CEO of RisingStack.
We are going to talk about the architecture of N26, a successful German startup. N26 is a mobile banking platform which allows its customers to do everything a traditional bank does - except in an easier way, and from anywhere in the world. The main markets of N26 are Germany, Austria, France, Spain and Italy, and they currently have over 500,000 users.
Our guest today is Patrick Kua, CTO of N26. Welcome Patrick, we are super happy you are here today and you could make it.
Patrick Kua - N26:
Hi, thank you very much for having me on the podcast, I'm excited to share some of the behind-the-scenes part of what makes a mobile bank successful.
RisingStack: Patrick, can you tell us a bit more about your background and how you’ve became a part of N26?
Sure, yeah! My story is kind of interesting because I've been consulting for the last 13,5 years with a firm called ThoughtWorks. Many of you may have heard of it since the chief scientist at ThoughtWorks was Martin Fowler, and we were very proud about pushing new ideas and new technologies into the industry. One of the biggest shifts was the introduction of Continuous Delivery which came out of ThoughtWorks, and I'm really honored to have worked alongside some of the people who contributed to those ideas.
I am quite a new joiner to N26 - I’ve been there for three months. They approached me to take on the role of CTO and I thought it was exciting to have this responsibility after doing a lot of consulting across lots of different industries including banking, finance, insurance amongst many others. As a consultant, you add a lot of value by bringing in new ideas and new approaches, but at the same time you end up being a little bit frustrated because you always rely on influencing people. You try to convince people to take on some of these choices that you hope will have a positive impact.
For me, one of the reasons I took on this role as the CTO is that I was really excited to meet some of the technologists, engineers, infrastructure people behind the scenes, and I was already impressed by the technology and services that they've developed. I really wanted to help them build on that base platform and lead the way around building an amazing product, which is growing and growing.
I think you mentioned we had 500.000 users. We had 500.000 customers in August last year and we grow on average by 1500 to 2000 new customers every day. And there are lots of exciting opportunities around how we grow. Joining the company was an exciting opportunity for me because one of the challenges that I used to consult in was helping organisations scale. I think it's exciting to be alongside a company as it scales and grows, and being able to support that key engineering culture.
RisingStack: When it comes to microservices Martin Fowler is a name that you really cannot really go around - did you work with him personally?
Yeah, so in ThoughtWorks Martin Fowler does a lot of visiting. I think he's very humble - he talks about how he's not the person who has a lot of the ideas. Over the 13,5 years I've been lucky enough to spend time with Martin across lots of different types of situations. I worked with him during the internal meetings about the TechRadar that ThoughtWorks publishes a couple of times a year. Also, and I've even had the joy of having Martin Fowler on-site for consulting with clients. He's very kind as well in terms of offering his platform - MartinFowler.com - and his readership with others. I'm publishing a couple of articles out there, and I think that's a really generous platform that Martin provides.
RisingStack: Yeah definitely. I think when we started out with microservices we learned most of the things from his blog too, so thank you. Can you tell us a bit more about the main languages you use for developing in N26?
And we also have a back-end experiment around TypeScript, but we decided not to go down that path. We're currently exploring Kotlin as moving towards a more modern JVM based language. Obviously, Java 9 is coming out at some point and that would have been a natural candidate, but I think we're also interested to see how Kotlin will develop. Google is giving it a warm embrace around the Android platform.
RisingStack: Can you tell us a little bit more about why you ditched Scala and what you like about Kotlin?
I think what's really interesting from my perspective and what the team is is exploring is this kind of simplicity about the language that Kotlin has. I'm a big fan of IntelliJ since its inception, and I've been very impressed behind the pragmatism of the JetBrains team. I think that pragmatism really comes across the language of Kotlin - it’s something that helps you get on with the tasks that you need to do and gets out of your way to do it. I think they've been really focused on the ease of use which really shines in their IDE IntelliJ, and I think I'm really intrigued to see how that will continue to evolve in Kotlin. This is one of the reasons that as a team at N26 we’re moving towards Kotlin more than Scala.
The feedback of that team is that they are pretty happy with it. Obviously, we don't have the challenge of everyone having to maintain the same code. It's really belonging to that team, so I think it's the language that they're most proficient in, and they are happy to maintain it since it hasn't led to a lot of unnecessary complexity.
RisingStack: Thanks for explaining it. What databases do you operate with for what purposes in N26?
We're surprisingly very ordinary. I think what I really like about the technology team is that we've picked very simple tools that are very well known and very stable. That lets us focus on speed and solving the problem of building a bank that the world loves to use. What's interesting about our stack and particularly our databases is that it's nothing special at the moment.
We have a combination of mySQL and Postgres databases. The mySQL is mostly used for a lot of the applications and services, while the Postgres database was used for reporting purposes. But we're moving away from that to Redshift for building our data warehouse. We haven't really specialized around storage yet, but it does what we need it to do and it scales for what we need right now.
RisingStack: What communication protocols do you use between your services?
We have a set of microservices. Most of the time a lot of the services are RESTful endpoints for synchronous communication. And then, we have a bunch of the asynchronous communications using queuing via SQS. These are the two protocols that we're mostly using, and we also have a couple of specialized protocols for the payments.
RisingStack: Can you tell us a bit more about you handle testing and what kind of tests do you have in place right now?
I think testing is interesting in our environment and I was very surprised about it when I joined. I think it's impressive for a bank to have this level of automation, which is much higher than what I've seen in a lot of other, more traditional banks and I think that it allows us to move very quickly. We have pretty much standard automation tests. Every team is expected to be rushing unit and integration tests where we do a lot more integration with partners. We rely a lot more on integration tests against their APIs, because with any partner, what's written down in a specifications is often not quite how a system behaves so we get a lot better feedback through those levels of tests.
We also have end-to-end automation tests. We're getting a little bit better at some of our end-to-end test including the full mobile applications, so we're developing suites that are testing the entire set of microservices, plus the front-end. And we also have a number of tests around our deployment as well. So we have a very strong automation, continuous deployment or delivery pipeline and as part of that, we also do tests when we deploy to make sure that things work well before we roll them out to customers. That's how we maintain scalability and quality for our end-users in mind.
RisingStack: You run these tests to make sure everything works fine when you deploy your services. Do you couple those with deployment strategies such as red-black or canary or something like that?
As part of a continuous delivery pipeline, we have what we call a golden server, which is the equivalent of a kind of canary, so that would be one of our steps. A pipeline service typically goes through normal unit testing, and we also have security testing automation in place to check for common vulnerabilities patterns. Then we package everything up into a deployable.
That gets shipped through different types of testing environments, so we go around integration and acceptance testing environments. Before it gets released, it goes into what we call the golden server, and if that works well then we'll slowly roll that out. Then we have a blue-green process where all the services will be upgraded in one of the area before we switch over traffic. And then the rest of the services would be updated without a deployable.
RisingStack: Wow, I think a lot of people dream about having this kind of automation in place. Quite often we have clients coming to us to put some automation in place for them, but usually when we tell them about all these things, they just kind of recoil from the idea of spending so much time of DevOps. But yeah, it's definitely impressive.
What I'm really proud of is that the team had this idea about investment in automation very early on. I see it really paying back because when we release - and we release hundreds of times per week - we’re able to do that with safety in mind and knowing that we'll be able to provide a good quality service as a digital bank. It's a really exciting place for people to work - imagine what's possible with that right direction and the right level of automation done very early on.
And when I think about it, you probably have the same situation with your clients and I was having it when doing consulting as well: It's scary to think about how traditional banks do things.
RisingStack: Do you use any specific CI tools?
So we use Jenkins as a main orchestrator but we don't use any special CI tools on top of that, the deployment and the entire pipeline is made through it. It's easy with Jenkins to click and configure everything. We've taken automation, source control and the idea of continuous delivery to heart, the infrastructure is very much source-controlled and managed that way. And so is our continuous delivery pipeline, which in a lot of places is another single point of failure. But for us, it's all source controlled and managed that way.
RisingStack: Did you face any scaling problems in your current architecture? And if you did how did you solve it?
At the moment our architecture is quite scalable, so we haven't really faced any internal scaling challenges. I think what's interesting is that we have a number of products that we've partnered with other companies for, and unfortunately, we have hit scaling problems with some of their external integrations. From what I understood, I think you were fairly impressed by the level of automation and CD processes that we have in place.
Unfortunately some of our partners don't have the same level of excitement, so sometimes the only place that we can do the tests is with test accounts in environments because partners haven't quite got the same level of rigor that we want. We're trying to work with them on that, but I would say that some of the scaling challenges that we have is making sure that our partners have the same level of quality that we expect in demand from our own internal services.
RisingStack: How do you prevent errors from cascading between your services?
At the moment we have timeouts and retries as part of that. We haven't got to any level of distributed tracing yet, and I think one of the reasons is that we have really excellent monitoring. For us, the user experience is really key both in terms of how users use the application and the focus we have on design and usability. It also translates into a really relentless focus on making sure that we know when users are starting to have errors before they do. When we are starting to detect errors we have a lot of information on all sorts of endpoints, so we’re able know when things don't look right, and then the teams can very quickly respond to that.
RisingStack: Can you tell us a little bit more about the timeouts use? Because you know, it can be problematic if you just use simple static timeouts and then longer queries. Can it be served properly? So how do you go around that?
I don't know the exact details because we have quite a lot of different services and it's more up to the team tech leads to make sure that happens. So it's a level of detail I wouldn't be able to honestly say. But I know that we do have some level of timeouts and retries for each team and service.
RisingStack: I see and do you use caching between services? And so when it comes to these infrastructural elements - I would like to just list some, so caching between services, or circuit breakers. Do you make sure that side effects are kept idempotent between services?
Yes, so for us, obviously transactions are quite important about idempotency and we make sure that when things are repeated, they can't be double-booked from that perspective. Also, it really depends on the types of services that you're talking about, so we have caching around some other more static type of informations. I think we use histories as well in terms of some of the tooling around the retry and circuit breaking, but I'm not really sure how consistently that's used across all the services yet.
I think all the tech leads make sure that there are responsible amounts of timeouts and retries around that. But I don't think it makes sense from our platforms to really standardize on one thing for all services, because for us, it really depends on the types of characteristics per service. So there are some services that are obviously less used because they're more references to static data. And then there are other services, such as transactions which are super high throughput, where we really need to make sure that they work, and idempotency is key for that.
RisingStack: I see, and can you tell us a little bit of details about how idempotency is ensured where it has to be?
I think it's basically whenever you book a transaction and you move money, if something fails to get booked, then we don't want to double-book that. And for us that's probably the real key part of moving money around which is like the heart of banking, really.
RisingStack: We at RisingStack take it very seriously to keep up with new technologies and we are very excited about what's coming next. So we would be very glad if you could share what specific technologies you are looking forward to in 2018 - and looking forward to implement at N26.
Yeah, so I think what's really exciting about the base platform that we have is that it's already quite mature from a continuous delivery perspective. And I think for us, security is a key thing that we're really trying to weave in. There's a movement which is beyond DevOps, DevSecOps and this is really about the way that we can bring in more automation and more security checking into place and weave that into the entire development process.
I think that as a general movement it is a really exciting place to be. I think you need really good DevOps and good continuous delivery processes to get to that next level. For us that's a really exciting place to be, because I think we have those base fundamentals. That means that we have a really good opportunity to weave security in more continuously and lead the edge in that way.
Also, I think that there’s another field that goes hand-in-hand with continuous delivery - the idea of continuous compliance. I think one of the interesting things about working in a bank is regulations and reporting, and I think this is something that continuous delivery does really help with. When you have builds, you have a lot of strong traceability about the reports and the information that come out of that. And I think that moving towards continuous compliance is a really great way of being able to understand how do we extract or keep track of the information from our builds. And a continuous delivery pipeline proves that we are continually compliant. There's a tool that we are looking at which is called dev-sec.io, it’s for hardening services.
But what's really interesting is the way that they've also built it which is using BDT style scenarios - that means that you get really good documentation about the tests that you run against your service to make sure that you can tie it back to the purpose of the test and the regulation. And then you get automated reporting as part of that.
And then our other goals are really around chaos engineering and chaos testing. Reliability for us is also another key, a bank has to be continually available. What I've seen to happen a lot with traditional banks is that they may plan one test year where they manually test a DR (disaster recovery) activation. We're in a really good spot to move towards experimenting with some of the chaos testing tools that are out there, so chaos monkey from Netflix and some of the other types of tools that are coming out there. Together they will help us build resilience and reliability from the get-go, and to make sure that each service that we build really has that aspect in mind.
So these are the two trends that I'm really excited about, that we're gonna be taking our company on in N26. Both I feel add a lot of value both in terms of safety and reliability and allow us to really focus on the product, once we have them part of our normal development process.
RisingStack: I cannot help but ask two questions. One of them is, whenever I mention Chaos Monkey to anybody, they just lose their minds, and everybody is saying “no, we're not ready for that yet”. And nobody ever feels they're ready for Chaos Monkey. So was it difficult to convince people to go that way?
We're still on that journey, but I think people are really keen and eager for that. I think the interesting thing at N26 is that everyone is very pragmatic. It's not about using chaos monkey for the sake of it. Everyone is behind the idea that we have to prove ourselves the resiliency constantly available, and therefore something like the chaos engineering toolset really makes a big difference. Also, I think everyone is really bought into the agile mindset of starting small, learning from that, and the more that you test and break your system the more resilient and stronger it gets.
I'm kind of lucky here, I didn't have to do a lot of convincing. I think that maybe people are a bit cautious about how we will roll this out, but I think everyone is keen to give it a go.
Because I think it's
A) really exciting field to be in and
B) adds a lot of value for our users who we are building software for.
So I think both of those things are really exciting to be an engineer at N26.
RisingStack: That's really great. The other one I wanted to ask you is that you mentioned that the business requirements and regulations change quickly when it comes to banking. And I couldn't help but think of Richard Rodgers book, the TAO of Microservices. The main argument he has is that you want to have your microservices as cattle and not pets, so practically you want to have throwaway microservices. And when there's a new regulation or a new business requirement, you just plug in a new service to handle that or throw away an old one and create a new one from scratch. Because they should be so small and so quick to develop that it shouldn't be a problem. Do you follow anything like that?
So I think in principle yes, microservices should be small sized and rewritable. I think there's also a question of how small is small, which is always a constant raging battle in the microservices world.
I think what's interesting - if I go back to the question about regulation - is that, like all things, it depends because there are some regulations that are really cross-cutting across all types of domain areas. The latest one is GDPR which is about data protection in the EU and about the right for student privacy. I think this is an interesting one, because you can argue that you could contain all information recorded about a person in a single place of your system, but that may not be what is important for how your business works. So you'll often have your customer view from a customer services perspective, but then you also have your account view of what that customer has registered with. And there's always that tension between putting all of that into a single place which means that you'll naturally have a bigger service to replace or just read. And then having to work out what thing is affected by the regulation. So, I think from that perspective there isn’t an easy answer to say you can put all things into a single service and you will be able to easily replace that to comply with regulation, because it really comes down to what it is that you need to be compliant with, and to understanding the impact of it across your domain. There will be some things that will cut across all things and some of them that will be a lot more isolated.
I think what really matters is more awareness about why the regulation is there, rather than simply following it. What often happens is that you have to do this implementation because rule such-and-such says without thinking about what the intent behind that is.
RisingStack: Thank you very much Patrick for sharing your insights with us today and telling more about your role what you play at N26 it was great to hear how your system is built and where you're going so thanks a lot for your time today.
All right thank you very much for the for having me on the podcast, I really enjoyed sharing the stories of N26 and thank you very much for the conversation.