Managing Technical Debt

Intro

Good engineering is the result of strong requirements grooming and prioritization, within the budgetary constraints available to complete a project. There is no such thing as a "perfect solution" that will work the way every stakeholder desires in every scenario because there is no such thing as an infinite budget. It is the role of experienced engineers to determine and implement the appropriate solution, based on continuous assessment with stakeholders around an appropriate solution to a problem. The cost of different products grows significantly as the margin for error decreases. In other words, it is often price-effective to solve most common scenarios and cost-prohibitive to solve all known scenarios.

A couple examples:

Non-ECC and ECC memory.

The use of ECC memory involves additional hardware and software to resolve memory soft errors, e.g. bit flips caused by radiation. Non-ECC memory may be acceptable in applications where memory corruption is not a critical failure. Some systems may be able to recover from a certain number of errors, but the threshold of system software resolution may not be tenable in mission critical applications involving space or aviation. A quick price compare on Crucial DD3 memory on 3-15-2020 shows that an otherwise equivalent 16GB stick of memory is 65% more expensive with ECC.

Redundant Resources

When budgeting for cloud infrastructure, higher availability can be achieved by provisioning redundant resources, making them available for a "hot swap" at a time of primary resource failure. For example, if a resource has an availability of 99% you would expect it to be down up to 87 hours a year. Adding a redundant resource with 99% availability increases the expected availability to 99.99% - an expected downtime of 52 minutes a year. A significant increase in availability, but since we are hosting a redundant resource, a doubling in cost. Do we really require the extra up-time? The answer to this question varies from project to project.

Understanding Tech Debt

The question of how reliable and optimized systems should be is answered via an ongoing process of engaging with stakeholders . Everyone wants to believe that their product is perfect and will do everything all of the time. This is an impossible reality. It is the responsibility of the engineering team to communicate early and often with stakeholders. Modern software should be designed such that when a Proof of Concept (PoC) or Minimum Viable Product (MVP) is released, there may be concessions made to robustness in favor of faster development velocity. This is fundamental to an agile approach to development - setting reasonable timelines and limiting scope allows for iterative and refined development of the product features stakeholders really want (not what the just think they want before they see any data from users). A waterfall approach to development greatly lengthens feedback cycles, such that stakeholder interest can be weeks or even months stale of the current understanding of the problem space.

An agile approach to development requires a firm conceptualization of technical debt - those things that we "leave on the table" or those that we prefer to solve later. Ideally, debt is understood and tradeoffs are made consciously up front. Technical debt that is accrued unintentionally comes in the form of bugs in design or approach, an implicit cost of software development. We know bugs will happen and be found, but we do not know where or when. Tech debt, however, should be considered as an explicit cost of doing business - we knowingly make a design decision that we will want to address later. This choice may result in a week of extra work down the line, but it also may allow us to ship a month sooner, or avoid unnecessary effort if the business has to pivot. In any case, technical debt should be treated like other forms of debt and paid off with prudence. Allowing known defects or performance issues to linger can incur additional debts down the road.

Approaches to Managing Tech Debt

Wikipedia provides an excellent summary of technical debt and provides many examples, a few of which we will consider further.

Tight Coupling

Tightly-coupled components, where functions are not modular, the software is not flexible enough to adapt to changes in business needs.

Many times, we want to release something quickly and test the waters with a new product or service. It often makes sense to ensure that the main business logic is implemented as quickly as possible, concerning ourselves less with future adaptability. That being said, with experience comes insight into planning for future changes, and we should account for this debt with the use of good tooling. Modern frameworks exist for most popular problem spaces (e.g. Dynamic Web, Static Web, Mobile) that are opinionated about approach, and provide a scaffolding that properly delineates code concerns. Do not reinvent the wheel here - follow best practices, especially when you want to move fast.

Lack of Tests

Lack of a test suite, which encourages quick and risky band-aid bug fixes.

Shipping code that lacks test coverage is never ideal. If you do not have QA resources, or enough developer time, to invest in complete test coverage for an MVP, at least plan to budget for this later and assess the staffing of your team. Developers must be allotted adequate time to write good unit tests and QA should strive to automate.

Poor Documentation

Lack of documentation, where code is created without supporting documentation. The work to create documentation represents debt.

10,000 lines of code and the application README is just the title of the project - this is not a good place to be. Documentation often gets lost in the shuffle when we want to move quickly. When we are only putting up a fraction of the things we would like to do, documentation is a popular candidate for deprioritzation. Here are some actions you can take now to avoid getting into a no-knowledge scenario.

  1. Properly scope and document tasks in your project tracker of choice (e.g. Trello, JIRA). Tie every code commit related to the task with the issue identifier.
  2. Write self-documenting code and provide good code comments. Engineers should be conscious of making whatever code they write easy to be understood by others. This is not optional. When the code level is properly authored and documented, it is easier to lift this documentation up to the project level when there are gaps in documentation.
  3. When your current work touches on a place lacking documentation, and you have the information, don't leave it empty - update it.
  4. Don't turn project onboarding for new team members into a cost-only scenario. Every person onboarding onto a project should consider the process they are following critically, making updates to documentation where they find gaps or find they can offer clarity.
  5. Engage less-senior developers, QA, and non-engineers in documentation tasks. Assuming we've followed (1) and (2), we have a strong foundation for knowledge transfer to occur and improve documentation, or to produce new documentation (e.g. for external teams, APIs). The person who wrote the code tends knows it best, but others can get engaged as a proofreader or reviewer, bolstering documentation if the original author is otherwise engaged.

Poor Collaboration

Parallel development on multiple branches accrues technical debt because of the work required to merge the changes into a single source base. The more changes done in isolation, the more debt.

As teams grow in size, the overhead of managing differences between code changes grows. This can be mediated in a few ways.

  1. Prioritization - we should not aim to intentionally have multiple developers editing same functions/files if we can avoid it. When possible, work multiple features/bugs (or tech debt!) in parallel so dependent tasks can move in a line.
  2. Process - each developer is responsible for their work being "merge ready" in the scope of what is ahead of them. When code works on your local machine it is not "done". If the project has moved ahead of your work you must act as a steward for your code even when you are done writing it.
  3. Separation of concerns - You project should be designed in such a way that components have a sensible division based on function (e.g. Model-view-controller). Avoid "roll your own" frameworks. Use modern popular frameworks that power tens of thousands of other successful software projects.

Don't be Fooled

Same examples from Wikipedia would be better categorized as bugs or flawed approach:

Lack of knowledge, when the developer doesn't know how to write elegant code.[7]
Lack of ownership, when outsourced software efforts result in in-house engineering being required to refactor or rewrite outsourced code.

What looks good on a spreadsheet is often not the best in reality - considering only the dollar amount when outsourcing without considering track record is risky.

A healthy alignment between business and technology comes from leadership. Here at JBS, we work hand-in-hand with staff engineers and participate in top-to-bottom design and development at many of the companies we work with. We are active in discussing approach, enhancing process, standardizing documentation, and automating testing and infrastructure. At companies where there is no staff resource, we ensure the products we produce for our clients use well-documented and popular frameworks such as Django & React and that we utilize code health best practices. Building on open source software is an approach that drastically enhances hand-off and reduces the friction of resource transition.

At JBS we've seen well-engineered projects handed off successfully to other teams with minimal friction - and we've also seen the ability for our own people to jump in with both feet when ineffective labor can't build a solid foundation.

Conclusion

The resolution of tech debt should be budgeted for within the scope of epics, sprints, or the appropriate method for your flavor of agile. It is not good to constantly deprioritize and leave tech debt in the backlog in favor of only features and bug fixes. The cost of resolving tech debt will only increase over time. Since we know the tech debt is there, because we designed and budgeted for it, it must be worked on with consistency to ensure it doesn't snowball into bankruptcy. Ignoring tech debt can lead to major impediments that may restrict the development of new features.

Tech debt is not tech ignorance. Do not be lulled by inexperienced vendors and cheap tech masquerading as "tech debt". A track record of creating MVPs, managing legacy integrations, and transitioning MVPs into and maintaining mature projects are all necessary to properly scope and manage tech debt. Tech debt is something we need to account, budget, and plan for when considering business goals now and in the future. There is no need to build things we will never use, but we must be cognizant to not make those things more difficult to add in the future.