Fixing “Search Sucks” – Controlled Vocabularies, Taxonomy and Faceted Search
“Search sucks” is a common refrain by users of poorly designed websites. It may be tempting to think that the inconsistent or unexpected results are caused by how the search algorithms have been set up and assume that search itself is the source of the problem. More often than not, the search algorithm is not the cause of the complaint if there isn’t a well-defined controlled vocabulary categorizing the content on which the algorithm is based. The first step to fixing the problem requires understanding what a controlled vocabulary is and how it supports both accurate search results and a consistent user experience across both navigation and search.
Concepts & Controlled Vocabulary
Common within the field of library science and indexing, but relatively unknown elsewhere, is the concept of a controlled vocabulary. A controlled vocabulary comprises concepts, their labels, metadata and their relationships. It is the rules used to tag classify/index information in a content repository, for the purpose of discovery and retrieval of information. Establishing a controlled vocabulary is the first step in creating a system that allows precision in navigation and search by reducing ambiguity in the labeling and organization of concepts. As the same concept may have multiple names and the same name/word may refer to multiple concepts, the focus of a controlled vocabulary should always be on concepts, not names, keywords or terms.
Concepts within a controlled vocabulary must be simultaneously:
- Mutually Exclusive
Controlled vocabularies can take a number of different forms depending on the level of complexity of information in the content repository. In order of complexity these forms include, but are not limited to, taxonomies, thesauri and ontologies. The more complex the form, the more complex the rules around data governance so it is advisable to use the least complex form that is required to support the use case.
Taxonomy is one of the most common forms of controlled vocabulary–yet it is one that is often misunderstood. The form of a taxonomy is always that of a hierarchy. However, not all hierarchies are taxonomies. Maslow’s Hierarchy of Needs or a company org chart are not taxonomies. In a taxonomy, the relationship between categories of concepts is bi-directional and asymmetrical where the higher-level category represents a broader concept while the related categories directly beneath represent narrower interpretations of the same concept.
For example, in a taxonomy of Appliances all Ranges are Appliances but not all Appliances are Ranges.
A simple check to determine if you are looking at a taxonomy is to check between two related categories whether the concept below is a type of the concept immediately above. If the answer is yes, it is more likely to be a taxonomy. Constructed in this way, a taxonomy allows users to easily navigate up and down a range of categories providing visibility and clarity to the entire breadth of information encompassed within the content repository. The end result is a system of classification where each item can be classified in one location and it is clear what that location is.
A faceted taxonomy is when additional attributes or facets are placed on categories within the taxonomy to capture specific information relevant to that category of items. In e-commerce, attributes in navigation represent the concepts most relevant to a user when making a choice between products within a category. Values under an attribute are mutually exclusive and exhaustive of options relevant to existing selection of items. For example, on the category of Ranges, the values with an attribute Range Size may be 20 in., 24 in., 30 in. and 36 in., and 48 in.
Filters are especially important when each category classifies a large number of items that would be overwhelming for a user to scroll through. While it can be tempting to add filters to capture every potential aspect of items classified to a particular category, generally there are only a few concepts which are relevant to a user narrowing selection in order to make a purchase. Surfacing filters that are relevant to purchase decisions keeps lists of filters manageable and ensures the user is not overwhelmed with options.
Faceted search is when the search tool is configured to present results based on facets and categories presented in the taxonomy. When a user types a term that is represented as a category within the taxonomy, the results present items classified to that category. If a user enters a combination of a facet value and a category, the results match that combination of category and facet value presenting the same results as if a user had used the navigation and filters. This ensures results are consistent across the user experience whether a user opts to use navigation, search or a combination of both.
Controlled vocabulary is the foundation of search within any content repository which demands accuracy and precision in search results. In e-commerce, it allows customers to quickly and easily narrow thousands of results to exactly what they are looking for. The next time you try to access information and think “search sucks” remember, it might be the lack of a controlled vocabulary that is the cause of your frustration.