Together we boost performance

For new and existing mobile and web apps

High performance: scalable

Do you need a fast user experience? Something that scales with traffic so that you can handle peak traffic while keeping costs acceptable? Blazing performance is where we shine. More...

Big data

Use that data collected by all those touches, clicks, likes, reviews, friends, transactions and more. And scale big-time! More...

Quick bug fixes

Faced with nasty bugs that need to be resolved ASAP? We can rigorously drill down until the problem is found and resolved. We've been there more than once. Probably we can help you out. More...

Beautiful architecture

Developing something new and want a pristine architecture that scales well developed quickly? The combination of a well thought out structure and a quick proof of concept that actually works.

Cloud and bare metal

Considering moving from bare metal to virtualized machines? On-premise or in the cloud? Infrastructure-as-code? We can help. More...

Team

A small, smart and experienced team of full stack hands-on SW architects/coders that bring results quickly. Take us on board in a few of your sprints with razor sharp focus! Let's talk.

High performance
Approach

If you rapidly need to improve application/server performance we can help you.

In a nutshell we help make your software perform better by analyzing and removing local performance bottenecks and by splitting the workload over multiple servers.

From quick wins to full auto scalability

First an analysis is made on the current and foreseen bottlenecks and the requirements. We will tackle the most critical and easy to fix parts first, ensuring that you get a performance boost quickly. There is no one-size-fits-all approach, but typically the following high level steps are followed after a thorough analysis:

  • Implementation of quick wins
  • Implementation of caching, specialized indexing and improvement of data layout
  • Adapting architecture towards auto scalability

At a more fundamental level this is all about (a) keeping data accesses short and (b) parallelizing accesses where possible. Keeping data accesses short means attempting as much as possible to: access from processor cache, then from RAM, then from SSD(/HD) and only then over a network. This in term involves a careful 'sequential' data layout. Parallelizing means that you can use multiple cores and multiple machines.

Out of these, the quick wins option is easiest (with some trivial steps that have most often hase already been taken).

Implementation of caching, specialized indexing and improvement typically also gives a huge performance impact. Depending on your needs, this may be enough.

Finally, adapting the architecture towards auto scalability typically requires more effort. We can tackle this in an incremental manner based on an up-front overall analysis and design which we do first.

In this last phase, the application architecture and data model are updated to allow for appropriate partitioning, fault tolerance and (auto) scaling that meets your requirements.

Implementation of quick wins

Before making the transition to a distributed architecture, there may be some quick wins which can be implemented if you have not already done so. Rougly speaking we have the following options from the trivial to the somewhat more complex:

  • Using a Content Delivery Network for your static data(*), minimizing payload size (minify javascript,css,html,images) and roundtrips to server
  • Run database on a separate server
  • Finetuning: proper settings of OS, (J)VMs and applications. Make sure those /etc/ files are not simply the 'demo' defaults...
  • Ensuring hardware specs meet requirements (RAM,CPU speed,IOPS)
  • Run application logic in parallel on multiple machines (behind a load balancer/reverse proxy)
  • Optimizing the keys of your database
  • Optimizing ill written queries or algorithms. Let's strive for O(log N) or better...
  • Removing bottlenecks related to 'latency roundtrips'

(*) Strictly speaking this is a distributed architecture, but in practice it can be implemented quite easily without any fundemantal rearchitecting, because static data such as fixed images is only very loosly coupled to the rest of the application and does not be updated frequently.

Implementation of caching and specialized indexing

The second option - caching, specialized indexing and improvement of data layout - can typically give a huge performance boost on read-heavy applications (which is often the case): Most probably options can be identified for caching significant parts of the data using e.g. Redis or Memcached. By doing this, the load on the database can be significantly reduced. Also for certain data it may be valuable to index the data outside of the origin database for features such as very fast search using e.g., Solr/Lucence.

Big Data
Big Data applications

Every action, every touch, every click of one of your visitors or customers generates a data point that can drive your insights. Also you may want to combine this data with massive external amounts of data that are out there varying from weather conditions to availability of flights or hotel rooms.

There are many applications of big-data. Some practical, commercial ones include:

  • Customized user experience and recommendations (promoting to customers what is most relevant for them)
  • Search - including auto completion (related to the above point)
  • Social interaction: chat, comments, reviews
  • Traditional off-line analytics and on-line analytics

Gathering and processing data

The above use cases rely on:

  • Gathering large volumes of data and storing them effectively in (distributed) databases
  • Online real time analytics (Stream based processing)
  • Batch based processing
  • Online transaction processing

Adapting application architecture, data model and storage model towards auto scalability

Despite all the hype, for this there is no silver bullet: it really depends on your use case, quality of service and server budget what the optimal solution is for you. Everyone faces the CAP theorem. Solutions may involve (distributed) NoSQL databases / search engines / key value stores such as:

  • Cassandra
  • MongoDB
  • Spark/Hadoop
  • Redis
  • ElasticSearch
  • Solr/Lucene

Especially for a transaction layer it will most probably involve a more traditional relational SQL databases such as:

  • MySQL
  • MariaDB
  • PostgreSQL
  • Microsoft SQL server
  • Oracle databases

Because it is not simply of matter of 'throwing your big data into a huge database', key questions that need to be answered for each use case are:

  • Which parts must to be transaction safe?
  • For which parts is eventual consistency good enough, and how to deal with this?
  • How best to partition the data (for performance and fault tolerance)?
Cloud/Bare metal
Infrastructure

Your typical high performance architecture will involve one or more load balancers, application servers, database servers and of course a Content Delivery network.

With the advent of 'Big data', the need for auto scalablity, and the need for rapid application development, chances are that the number of servers deployed increases: from Monolith to Microservices.

Decisions

So here you are faced with a number of key decisions, that we can help you with as an indepentent party with no ties to any of the suppliers.

  • What type of (virtual) infrastructure to use for each piece?
    • Regular servers ('Bare metal') or Virtualized (KVM, Xen, ...)?
    • On-premise, co-located or private-rack, private room in data center or in 'the Cloud'?
    • Infrastructure as a Service (IaaS) - 'hire virtual machines' or even Platform as a Service (Paas) - 'hire MongoDB including backups'
  • How to orchestrate / manage your services across the infrastructure?
  • How to maintain stability and monitor your services accross the infrastructure?

Key considerations

Key elements that play a role in these designs are:

  • Control: The amount of operational control that want to maintain: E.g., A Platform-as-a-service (Paas) can give an SLA with a good up-time, but this is only so good if they have an upgrade / migration scheduled. Also API compatibity issues can severely break things.
  • Independence: The extent to which you want to avoid vendor lock in. PaaS based on open source is great, but once you get locked into custom modifications things can get tricky.
  • Reliability: if managing servers and database configuration is not once of your core compentencies, then leaving this to a third party with a good track record may yield a higher relability.
  • Costs: Out of pocket costs voor Paas will be higher, but all-in the costs could be lower.
  • Security and privacy: Company policies and law may dictate one over the other.
Infrastructure as code: What we can do for you?

We can help you define an architecture and your migration or greenfeeld implementation of 'infra-structure as code'. We have experience with providers and tech such as:

  • Configuration management with tools such as Chef and Puppet
  • Virtual machines (such as KVM) on own bare metal (on premise or at your own hosting provider of choice)
  • Cloud providers such as Google Cloud Services (GCS), Amazon Webservies (AWS) or Microsoft Azure
  • Lightweight virtual machine / packaging such as Docker
  • Autoscaling such as Kubernetes
  • Monitoring solutions such such as Nagios, ntop and Pingdom
  • Load testing with tools such as Vegeta on a large cluster of machines

We are independent of any specific provider and have a nack for open source technologies so you can be assured that you will not be locked in to a specific platform provider/vendor.

Debugging
Debugging

The TL;DR is that we can bring in a small but very smart and experienced team that will rigorously drill down until the problem is found and resolved.

If you are faced with nasty bugs in your applications that are hard to solve or diagnose we can help solve the problems.

Approach

If the root cause of the issue is not known and there is pressure to solve the problem, it is investigated what temporary "quick fix" or workarounds are possible. These will be implemented, tested and rolled out in a controlled manner.

To find the root cause of the problem of course the Scientific Method is used together with a divide and conquer/difference analysis. In this process, the entire team will be extremely critical and go trough an iterative process of:

  • analysing
  • questioning
  • formulating hypotheses (and discarding old ones)
  • developing testable predictions
  • executing tests and gathering test data

In this approach the hardest part in this exercise is usually finding a way to reproduce the error (especially in the case of race conditions). Once the error has been found, solving it is often(*) relatively easy.

Reproducing the error

To reproduce the error it may help to devise a test setup that bombards the software with a huge number of events in parallel in a (pseudo random) manner.

Even after taking special measures, there may still be cases in which the problem reproduces only very sporadically, meaning that the analysis and hypothesis formulation steps become all the more important (because in this case data is more scarse and because typically once you get closer to the root cause with your hypothesis, it will also become easier to reproduce the problem.

Once an error can be reproduced on demand, then it can be solved relatively quickly by using a divide and conquer and difference analysis approach.

Key success factors for succesful debugging

To be quick and succesful in debugging requires the combination of the following approach and skills:

  • A creative, associative, inductive way of thinking that allows the team to come with good hypotheses.
  • Experience in the applications/technologies covered, to help to more quickly formulate relevant hypotheses.
  • A rigorous fact-based approach which sharp deductions that does not allow any assumptions to creep in.
  • A thorough understanding of the subject matter. This can not simply be a copy/paste of some Stack Overflow text without really understanding what is going on.
  • Being aware of the fact that there may very well be more than one bug: this means that divide and conquer can lead you to chase a red herring.
  • Good data logging and visualization tools
  • Being able to iterate and test very rapidly: There should be no red tape and quick/responsive communications with all parties involved.
  • A strong drive and focus to find the errors and to keep on looking

Common error causes

Likely culprits, that we have seen over and over again (and that are hard to reproduce) are:

  • Race conditions
  • Deadlocks
  • Memory leaks(**)
  • Numerical instability: With floating point numbers == does not make sense.
  • And for the low level C and Assembly code out there: of course the trivial uninitialized memory reads, invalid memory accesses ('buffer overflows')
  • And for the managed coude out there: Garbage Collection (GC) stalls.
  • And practical but common:
    • Version incompability on very specific functionality. It seems to work but it is not OK.
    • External services sporadically failing.
    • Default settings that 'seemed OK' for development, but not for production.

Based on our considerable experience in debugging and software development and the extreme drive, creativity, smartness and our dent for being overly critical we believe that we can help you significantly reduce the time to find the error. Sometimes a mere fresh perspective can speed things up considerably for the existing team.

After solving the error

As a last step we want to be really sure that the error is solved in a robust manner, and that no side effects have crept in.

Beside code review and design review, extensive (stress) testing is performed to ensure that the application is good to go.

(*) But not always unfortunately. For example problems caused by Garbage Collector (stalls) may require significant code rewrites. Also, sometimes parts of applications have become so convoluted by successive authors 'fixing' other authors' mistakes, and adding functionality without fully undetstanding what is going on. In these cases rewrites of larger chunks of code may be required.
(**) Yes these can also occur with managed run times / scripted languages...

Contact us

Give a call: +31 20 6251599 (The Netherlands)