Enterprise Microservices: What it takes?
Please note that this is page is under construction ...
Adopting microservices architecture is a big challenge to any development team.
Breaking down a monolithic application into a group of small and decoupled services
has an impact on the way the services are designed, implemented, tested, and deployed.
A big attention should be taken at the architecture level to identify the services
and to define clear boundaries between those services
and to ensure that grouping all services together will be well orchestrated.
The perception from the end user should be as he is still interacting with the same single "monolithic" application.
Here are some areas that need to be considered when adopting microservices architecture:
-
Performance (Response Time, Throughput, and Scalability)
-
Automation
-
Error management
-
Reporting
-
Continuous Integration / Continuous Delivery
-
Provisioning
-
Deployment
-
Configuration Management
-
Containerization
-
Orchestration
In this page, I will share some thoughts on how to create enterprise applications that meet the requirements listed above
by leveraging microservices architecture and using a workflow engine and messaging system
to manage and automate the services execution.
The idea is to be able to manage the complexity of mircoservices and act on the flow of execution of those services.
We want also to be able to audit and monitor the different components of the application and produce reports to better manage the performance issues
and to identify bottlenecks and spot problems and errors.
I will also present some frameworks and tools that are related to containerization, deployment, and orchestration of services.
The sample design (above) uses an entry service that read data (csv, xml, records, ...) from external sources (file system, rdbms, ...).
This entry point of the application produces a payload data that will be stored in a staging storage (HDFS).
It will also create a payload message (headers, parameters, variables) that will be used as an entry point to start the workflow engine (jBPM).
The payload message will also be used as a mean to communicate information between the tasks of the flow and the services.
Each time an entity will act on the payload data, it will produce a new payload message and send it to a messaging system (Kafka).
First the application server will read the payload message from the messaging system and trigger the workflow engine to start the services workflow.
The tasks of the workflow should implement simple business logic and should be responsible only for adding custom parameters to the payload message.
Each task of the workflow should submit the payload message to a specific topic of the messaging system.
The services should subscribe to their specific topics and read messages from the messaging system.
Each service should implement the business logic specific to the executed task.
If the execution is successful, the service should add new parameters to the payload message and submit it to the main topic of the application.
The new payload message should be read by the application server and resume the worflow to execute the next task.
If the service fails to execute the task, it should send a new payload message to a specific retry topic.
The new payload message should include new parameters (like the retry number).
If the retry number reaches the maximum retries, a new payload message can be sent to an error topic that can be handled later by an administrator.
The services should always check their retry topics for existing messages and handled those first before handling the messages from their main topics.
The services should handle the number of failed retries and implement delay strategies that imposes a delay before reading messages from the retry topic.
An administrator (or an automatic task) should be able to resume the process and retry or cancel a failed task.
Microservices Workflow Automation: jBPM
Using a workflow engine to automate the microservices workflow execution
makes the flow of execution of microservices easy to understand and allows identifying any bottlenecks in the whole system.
It also allows an administrator (or an automatic backend service) to resume or cancel the process if it was blocked in a specific task.
In a complex enterprise application, the number of microservices can be huge
and any attempt to design one single process model for all microservices will makes the workflow very difficult to design, understand, and maintain.
An enterprise application is composed of multiple business domains
that are decoupled from each other but they still need to interact and communicate in order to complete cross domain tasks.
Each business domain will need to have its own workflow and it will be responsible of orchestrating the services that it owns.
It's also not always easy and possible to identify a specific service that will act as the owner or the main entry for all the services.
In other words, it's difficult to design a main workflow that will orchestrate the flow of execution of all services.
This doesn't mean that we can't design a reasonable number of workflows for the main business activities.
We just need to consider that in some cases the integration between different services may uses other approaches to communicate with each other,
i.e. using rest apis, messaging systems, ...
See this page for more information and code samples of jBPM:
Java Business Process Model (jBPM)
Messaging System: Kafka
Using a messaging system to manage communication between tasks of the workflow and services helps in decoupling the communication between them.
It also allows adding new instances of services without the need to apply any additional configuration.
A service need to subscribe to a specific topic and start reading messages from that topic and execute the related task.
The load on services is automatically balanced on the different instances
and each instance will execute more or fewer tasks depending on the resources available on the machine on which it's installed (RAM, CPU, Disk, ...).
A service should be responsible for notifying the failure of execution of a task
and it should implement strategies to manage failed tasks and retry its execution if it's required.
The payload message should include clear metadata so the services can interpret its and take the adequate decision.
It's expected that a service may consume a message but will fail to notify if the task was either successfully executed or it failed to execute it.
Such situations is difficult to manage. A service may successfully complete a task but fails to notify that.
The opposite scenario can also happen, a service may fail to complete a task but also fails to notify that.
Another scenario, the service froze or stopped and there's no way to get any feedback of the state of the current executed task by the service.
The workflow engine makes it easy to track and monitor the state of the execution of tasks
and it's possible to resume or cancel a task based on some defined strategies (specific events, timeout, ...).
An administrator can use a dashboard and monitor the processes and act on them if needed.
We can implement a Monitor service that also can act on the executed tasks using the same or different strategies.
See this page for more information and installation steps of Kafka:
Install and configure Apache Kafka
Securing Service to Service Communication: JWT
I will consider two types of communications between services: external communication and internal communication.
The first type of communication requires an authentication from an end user (or an external service).
In this case it's necessary to verify that all requests are authenticated and authorized before executing the related tasks.
It's also important to ensure that sensitive data is encrypted.
The second type of communication happens between internal services
(let's assume they are all backend services setting "safely" behind a firewall,
which often is the statement if one wonders why backend services are not secure).
In most cases we don't need internal services to authenticate when communicating with other internal services
but we must validate that services are authorized before executing their requests
and we must validate that the request really comes from the claimed source.
Identifying the type of communication is important to decide the requirements (authentication, authorization, encryption)
and the solutions to secure the communication between services.
JWT (JSON Web Token) provides the infrastructure that services can use to securely transmit information.
The payload data of the token can hold the information required to identify the issuer and validate the request
(in general the payload data should not contain sensitive information).
The token has a signature that is used to verify the token and ensure the integrity of its data.
The issue here is that the token, if intercepted, can be used by anyone and can be sent and get verified by the service.
If needed the payload data can be encrypted but the encryption requires a mechanism to safely share a secret key
(which is not always an easy task when managing a very large distributed services).
Using ZooKeeper as a centralized configuration storage may alleviate some of these difficulties
by allowing the services to use a secure place where they can fetch both the secret keys and the public keys of each service.
The payload message will contains the JWT token that services can use to verify requests.
In the case where external end users are interacting with services through an API gateway,
the use of OAUTH is important to make sure the requests are authenticated and authorized.
The API gateway need to manage the token (OAUTH token) to verify requests from the end user
and it will be responsible for creating an internal JWT token that will be used to communicate with internal services.
The OAUTH token is an access token that shouldn't hold any sensitive information and should only point to the authorization server.
The internal token may contain all the needed information and verifying the token should not requires any communication with the authorization server.
To enforce the security of the JWT token, the services may add additional information to the token,
like the identity of the issuer, the targeted services, the expiration date.
Services Configuration: ZooKeeper
A service may require custom configurations that are specific to each instance.
This kind of configurations usually can be set at startup time (initialization or deployment script)
and can be stored locally using environment variables or properties files.
In most cases, there is no need to change these configurations at runtime,
and if this needs to happen then a restart of the service is usually required so it get initialized with the new configuration.
A service also shares common configurations between all its instances.
An external persistent storage is required in this case.
The main challenge here is to decide when and how these configuration should persisted in the persistent storage?
One approach is to have a common administration service that needs to be started first
and it will be responsible of saving the default configuration of the service in the persistent storage.
When the startup and initialization of the administration service instance is completed successfully then the service instances can be started.
A shared configuration need to be easily managed and customized.
APIs should be provided to customize the configuration of the services.
All instances need to be notified when a configuration is updated.
Using ZooKeeper as a persistent storage allows distributed services to share configurations.
It provides the infrastructure to notify about any changes on the configuration.
Services can use ZooKeeper to share private configuration of each instance.
It can be used to share the public keys of each service that will be used to validate JWT tokens when handling external requests.
See this page for more information and code samples of ZooKeeper:
Apache ZooKeeper
Logging: Solr
To debug an error, developers may need to analyze logs from different services.
So administrators need to login to multiple hosts to get log files.
Even in simple deployments, this task can be very difficult
as administrators may need to know which log files to collect and from which hosts.
Developers may take a lot of time to debug an error
as there's need to make a correlation between all the logs from different services and multiple instances.
Services need to produce errors and events that respect a common schema for log messages.
The schema should be structured in main metadata and extended metadata.
The main metadata should include fields that are relevant for search queries and should allow an easy correlation between events.
The extended metadata, if needed, should include details that are specific to the event and can be used to better understand the cause of the error.
Log metadata should allow segregating messages based on
the type of the event (system, application, business),
the source of the event (host, service, instance),
and the correlation information (message id, payload id, user id).
Using Solr to index errors and events generated by services
helps developers to get a centralized place where they can search for specific errors
without having to connect and login to specific host and search in the log files.
A monitoring service can use the indexed logs to audit and monitor the services
and produce reports to better manage the performance issues and to identify bottlenecks and errors.
Solr features (faceting, filtering) can be used to limit the scope of a search and hence focus on limited set of messages when debugging errors.
They can also be used for analytics purposes by providing reports specific to each category of messages.
See this page for more information and code samples of Solr:
Apache Solr
Containerization: Docker
Containerization brings a lot of benefits to microservices development and deployment.
It enforces a consistent way of delivering components and ensure that services are portable across platforms and environments.
It makes the installation of services easier and ensures that resources allocated to each service is respected.
It also allows a better orchestration of the microservices and eases scaling and deploying new instances as needed.
Container Orchestration: Kubernetes
Managing containers is no easy task especially when dealing with a large number of containers.
Container orchestration brings solutions to containers managements issues
such as deployment, resources management, availability , scalability, ...