Technical Documentation Challenges for Large Doc Sets

Large sets of structured documentation face unique technical communication challenges. This list is based on my technical writing experience at MathWorks. All opinions are my own. I’d love any insights on these challenges.


MathWorks creates technical computing software based on its programming language MATLAB and modeling software Simulink. To address different technical computing needs, additional products build onto MATLAB and Simulink. Each product has accompanying documentation. This results in a doc set that has thousands of pages, is highly technical, has complex requirements, has possibly millions of readers worldwide of varying abilities, and must appropriately complement the software.

Search Engine Optimization

MathWorks doc doesn’t have a problem of authority since Google trusts However, other problems arise:

  • Disambiguation: Multiple products can perform the same operation, or different versions of the same operation. Example: Both MATLAB Production Server and MATLAB Compiler SDK provide a Java client. In a search for “matlab java client“, how does the user realize which pages are from which product? Further, a new user will not even know these products exist or how they interact with MATLAB.
  • Info buried in pages: Especially given the move to longer format pages, essential info can be moved further down the page. Even when a <h3> is used, Google will not promote the result over a better optimized non-doc result. Example, searching for “matlab solve inequalities” promotes MATLAB Answers posts over the solve doc that has a perfect <h3> match in Solve Inequalities. These non-doc posts may contain inaccurate information and will eventually become outdated. However, doc cannot create a SEO-optimized page for every question a user has.
  • Phrasing based on task versus functionality: For highly technical tasks, the user might not know the functionality needed. The user will then search by task, not functionality. However, doc pages may be labelled by functionality. Even if labelled by task, doc cannot capture every task or variation on a task. Further, Google’s synonym matching only goes so far when it comes to highly technical documentation. Example: The rrelief function is labelled as “Importance of attributes (predictors) using ReliefF algorithm“. Should “ReliefF algorithm” come first or “importance of attributes“? Then, “(predictors)” is a synonym users might use instead of “attributes“. Are there other synonyms? How might they be captured? What else might a user search for when they need this function?

Updating Doc for Software Changes and Multiple Versions

  • As software changes, documentation needs to be updated. In large doc sets, finding all doc affected by a software change isn’t always simple.
  • Code examples may break with software changes. Testing documentation may seem straight-forward but becomes much harder when combined with formatting, numerical artifacts, links in output, external dependencies for code etc.
  • Screenshots need updating, a very challenging problem with GUIs
  • Continuous publishing becomes harder as dependencies increase
  • Users are often on older releases. They may find doc on functionality they don’t have. While this is frustrating, it also motivates users to upgrade.
  • Function changes are listed in the release notes but not mapped to the associated doc pages. But, if changes were included, then for users on recent releases wading through compatibility info is simply annoying and doc would lose authority with them.

Complex, Highly-Technical Workflows

  • Workflows can stretch across multiple products. However, a product’s doc is contained because users don’t have all products and mentioning other products is distracting.
  • Users have complex and highly technical workflows that use a product in ways that the writer cannot imagine.
  • It isn’t possible to document every single use or even the majority of uses. We have two options: functionality-based doc where we simply show the functionality and user adapts doc to task, or task-based doc where we show how the functionality solves the user’s task. Of course, the latter is much more relevant and useful. But with the caveats above, documentation becomes a combination of the two based on judgement.

Teaching Users to Code

Users learn to code from documentation. They pick up coding practices. However, doc is not a good teacher. For example, doc may use code like fourier(a*exp(t)). Users follow this pattern instead of declaring expressions separately and end up using fourier(a*exp(t) + cos(t) + log(sin(tan(t)) - 5*zeta(t)). Their learned behavior does not scale to large programs. The suboptimal coding practices affect users as they write more and more complex programs based on practices learned from small snippets of doc code that do not teach the coding practices necessary for large, complex programs.

Complex Tooling Requirements and Content Generation

  • Many products mean many differing requirements
  • Every additional requirement must map to some XML tagging and transformation scenario
  • Additional tagging means overhead for writers and maintenance for doc developers
  • Need to be maintain standard look and style
  • What is the trade-off between adding tags and transformation rules and retaining a framework that is usable?
  • How does the Style Guide adapt to the web? How relevant is it?
  • Multiple writers working on the same content need to coordinate
  • Long build times for large doc sets mean doc is always building and cannot be viewed by others

Coordination between Software and Documentation

Doc is written after the code is baked. After developers are done, writers need time to write doc while devs are ready to move on. Further, writers must still provide usability input for the next project their developers are on. This is easy in a startup but in a large company with complex software and dependencies, coordination issues are created that require careful orchestration and process.

Institutional Knowledge of Documentation

When doc sets are sprawling and decades old, then writers will not necessarily remember why certain documentation decisions were taken. Did a user specifically request it? Did a request come from management? Was there a situation that no longer exists that demanded the piece of information in question? This problem gains additional urgency when a new writer joins. Knowledge may be hidden in a Wiki, in emails, in other design documents or KB repositories. How does a writer parse this information and understand their doc? Further, how do new writers “learn” their doc set?

Understanding the User’s Mental Model

For a writer of any doc set large or small, accurately modeling the user’s mental model of the software and the documentation is hard. As software and doc get more complex, more considerations can arise.

  • Mental model of software: For a user of one of our products, their mental model isn’t just the result of reading that particular product’s doc set, but of reading the entire company’s doc set, reading StackOverflow and MATLAB Answers questions, reading tutorials on .edu sites, using other software etc.
  • Mental model for complex workflows: Here, a user requires a different understanding of the product than for simple workflows. However, when developing doc for complex workflows, the writers and developers may have trouble realizing their implicit assumptions; especially if they don’t have much exposure to the user’s experience.
  • Mental model of documentation: For writers intimately familiar with their doc, putting themselves in the framework of how a user uses doc takes effort. Users mayn’t notice certain UI features, may not familiarize themselves with any custom doc search functionality (as opposed to using Google), and may not realize how the different sections of the page fit together because they only read examples.
  • Mental model of how users search, browse, and learn: Independent of the writer’s or company’s particular doc set is the question of how users search, browse, and learn information on the web. Users skim, skip, and read non-linearly. They start with code and read backwards. Writers need to remember this.

Translation and Internationalization

  • For a worldwide audience, translation may be necessary
  • Technical translation with source text in XML brings unique challenges
  • Certain phrases do not translate well. Ex: “find” in “find the value of” could translate as “search for” instead of “calculate
  • A word can have multiple translations based on context

Doc Formatting, Navigation, UI

There’s an essentially infinite list of possible UI tweaks. Some are easier to implement than others, but judging the ROI on each tweak from user behavior isn’t always easy. When UI is implemented, users might take a while to notice a new button or option. Thus, results aren’t instantly measurable. Further, as a doc set gets complex, ensuring a UI tweak works across all doc requires greater investment.

As an example of how there are infinite UI tweaks with hard to measure ROI, here are UI questions for code blocks:

  • “Copy code” button required?
  • Syntax highlighting for every language?
  • Add code line numbers?
  • Run code in product or browser with a click?
  • Add tabbed browsing for API code in different languages?

Another challenge is flat browsing versus the traditional hierarchical navigation. Users land on a page from search and follow links to related pages. Their navigation experience is flat, with links forming the edges of a graph. How does navigation UI adapt to this? Do we implement some sort of relational navigation?

Lastly, how can users have a smooth experience across software and documentation? Can documentation move inside software so that users don’t have to interrupt their thought process to look up documentation?

User Feedback and User Questions

To close the gap between what we think users want and what they actually need, user feedback is essential.

  • Solicit user feedback unobtrusively
  • Identify problem doc; problem doc can even expose API gaps
  • Forum for user questions; feed questions to the right writer
  • Writers can use feedback and questions to build their understanding of user’s mental model
  • Ensure user questions get high-quality answers because user questions are framed in the user’s language and thus can perform well in search — if answers aren’t accurate or become outdated then writers lose control of truth
  • A puzzle is controlling the truth on 3P sites such as StackOverflow, .edu sites, course materials etc.
  • Finally, connect forums with relevant doc; show questions from forum on page
  • Essential that doc retains authority over 3P sites with users to ensure control over truth is maintained

Data Gathering and Web Metrics

Gathering data on user behavior and user needs can be challenging. Feedback can be qualitative instead of quantitative making it difficult to condense, analyze, or communicate.

  • Web metrics are hard to analyze for technical documentation because of the different possible interpretations for data. Example: Did the user bounce quickly because the page instantly gave them what they wanted, or because they really disliked the page?
  • Reasons behind user behavior on doc pages itself isn’t easy to discern no matter how good the data
  • With MathWorks products, there is an infinite amount of user data scattered across forums, .edu course materials, StackOverflow etc
  • Gathering data from inside the product presents its own challenge. How can 50,000 pages across ~50 products be summarized? How can quality be measured? How is the “amount” of doc measured so that resource requirements can be calculated? What does it mean for one page to get 1000 views/day and another to get 1 view/day? Can doc pages and product doc be compared to each other? How can pages in need of attention and pages that need to be deleted be identified? What content is out of date?

Scaling Processes and Standards

As a documentation organization scales and faces successively more complex challenges, introducing additional process and standards is tempting. How can the organization avoid process and standards creep? Of course, this challenge is not doc-specific but applies to all large organizations. And how does the doc org handle doc that gets no views but must be kept because it’s required API documentation?

Communicating Doc Perspective to Devs & Management

Developers are not as close as doc writers to the user or documentation. Thus, they may not have the context to understand the writer’s priorities and perspective. Building consensus on documentation perspective is a continuous challenge because the writer’s understanding of user behavior and the user’s mental model can differ from the developer’s. Further, the writer’s understanding may be based on a qualitative review of user feedback and user questions instead of hard quantitative data. To effectively communicate such qualitative understanding, writers should continuous communicate with their developers to demonstrate their understanding of user behavior, backed up with the qualitative or quantitative data at hand. This builds trust in the writer and brings developers on-board with the writer’s decisions. Specifically, developers tend to prefer including more information than writers want to. If a writer has built trust and a relationship with their developers, then building consensus on what to include in doc becomes easier. Further, this relationship proves useful when writers provide input on UI and design.

High-Level Summary

  • Maintaining authority with users
  • Matching search results
  • Completing the loop with user feedback
  • Controlling the truth
  • Updating doc
  • Managing complex tagging and architecture
  • Tweaking UI
  • Understanding user’s mental model
  • Communication with devs and management

Leave a Reply