Enterprise Search at a Crossroads. Again.

Google have announced the discontinuation of the Google Search Appliance. So what happens next?

With the dust settling on Google's announcement that it is to discontinue and wind down sales of its on-premise search appliance offering, it would appear that Enterprise Search is, once again, at a crossroads. And as Google now officially doubles down on its effort to provide a cloud-based solution, albeit with both features and release dates unannounced, we look at what their announcement means for search, and for GSA customers in particular.

It should be said that this isn’t the first time that we’ve seen a large software vendor deprecate a product. A few years back Microsoft pulled support for FAST ESP in favour of search built into Sharepoint. Having happened before, it will likely happen again.

But if you’re a GSA customer, the news may have been disquieting. How can you ensure continuity of your search solution beyond your term with Google Search for Work? Which platform should you consider switching to, and what challenges are you likely to face when you do? Valid questions, and as you might expect, there isn’t a one-size-fits-all answer. Your next best step is going to depend largely on the needs and specifics of your environment, the characteristics and landscape of your data, and the skills you have available.

What we can say is that if you’re a Twigkit GSA customer (directly, or through one of our partners), then there is good news: switching will not negatively impact your search application. Architected and built from the ground up with this kind of eventuality in mind, our technology abstracts and separates your search application from any and all underlying data providers. So if you need to swap your search engine for another, you can do so quickly, simply, and without affecting your overall investment in the application.

 
 

Search is not a commodity

One of our customers switched from FAST ESP to an open source provider, Elastic, after careful consideration. Their data was structured, editorially managed and (although paywalled) generally accessible. This leads us to a key question a lot of our customers are asking themselves: the viability of open source search engines in the enterprise.

Here’s what we think: as it stands today open source search has closed the gap on many of the commercial vendors, but there isn’t yet a truly viable open source contender when it comes to an end to end, out-of-the-box solution in the murky world of unstructured enterprise content.

Reading obscure binary formats, connectors for ever-changing, ageing repositories and convoluted access controls. These are all battles that have raged for decades, and are smaller parts of a war that, to date, the open source community has been hesitant to join (we say this out of love: all of us at Twigkit are fierce proponents, advocates and committers in the community).

The bottom line is that unless your content is already structured and accessible; open source is going to present you with some challenges along the way.

Structured vs Unstructured Data

Nothing determines the quality and accessibility of data like structure. So before you start to price up your commercial options, take a good look at your data. The more structure that surrounds your assets, the easier it will be to make it searchable using open source alternatives like Solr and Elastic. The same principle applies to usability: more structure helps your search engine to deliver the results your end users are trying to find.

Structure is good

If your data is highly structured (either because it’s sitting inside a database or is in a structured format like JSON, CSV or XML) it tends to be very easy to index it in something like Solr or Elastic. Both you will find to be fast, capable and highly scalable, and Solr even has tools to help ingest this sort of data easily. If this is your world, and you haven’t used these platforms before, be prepared to be amazed at the capabilities they offer. Lightning fast, accurate facets with rich support for complex aggregation and statistical analysis on the fly.

At Twigkit we use Solr and Elastic for business intelligence type applications on billions of records; for just a fraction of the cost they offer matching capabilities that one might expect to find only in specialised commercial solutions.


 
A recent Elastic powered Twigkit application. Learn more here.

A recent Elastic powered Twigkit application. Learn more here.

 

A Messy Room

If the data landscape of your organisation is heavily siloed and involves content in many different formats spread across many different locations (file shares, Documentum, Sharepoint, Lotus Notes and others), you will almost certainly need proprietary connectors to accurately extract your content and correctly restrict access (more on that later).

In our experience these established, monolithic software solutions rarely remain static for long. Between versions many things affecting the structure can change, making properly extraction of your content something of a moving target.

And of course the more solutions and versions of each solution you have, the more challenges you’ll face. Our advice would be to take a good look at the vendors (either connector or full stack search engines) who most closely might fit your needs.

Ultimately the structure and accessibility of your data should strongly steer your final decision. If you need informed, impartial advice on the subject, we and our partners are very happy to help.


Security

“Secure search” can mean a number of different things, but an important consideration for anyone evaluating a search platform is whether they will require support for security at the document level.

Document-level security controls govern access to every document in your search application on an individual, per-document basis. Access to each document is granted or rejected based on permissions which are generally set at either user or group level. Permissions are derived from privileges assigned in the file system or a content management system, usually in conjunction with something like Active Directory. This matters because these privileges need to be appropriated and stored alongside the documents themselves at the time they’re indexed (allowing access controls to be enforced at query time).

Commonly known as security trimming or early binding security , this is really the only scalable/reliable/usable way of making sure your end users don’t end up seeing data that they shouldn’t.

If document-level security is something that your organisation needs, we strongly suggest that you commit to a commercial vendor that offers this capability out of the box.

Cloud or On-Premise

Finally, to cloud or not to cloud. Google have announced their plans although details are scant at the moment. Other major players are already offering hosted versions of either their own proprietary search engines, and managed versions of Elastic and Solr (or Solr-based).

We have experience with these services, and feel that they really shine in their ability to automatically scale for content volume and query load. For publishers with a large amount of public content and/or simple security models, this is an attractive option. For more sensitive industries the jury’s out. You may feel more comfortable knowing that your information is housed somewhere you can see it.

Spoilt for choice

There are strengths and weaknesses to all the big and upcoming players in the space. HP IDOL is the veteran heavyweight in enterprise search. With Solr at its core, Lucidworks Fusion offers a library of connectors and document level security trimming. Niche player Attivio offers highly capable technology that offers all of the above whilst closing the gap between database and search, worlds which use to remain firmly apart. Similarly, NoSQL vendors like Marklogic have started to move towards the market themselves with built-in search and discovery capabilities of their own.

These platforms, like the many others available, all have their particular costs, strengths and weaknesses. Familiarise yourself with the capabilities and support that each one offers, and map that against your budget, enterprise preferences, and the nuances of the problems your organisation is trying to solve.

Applications

Whether you’re building a search solution or surfacing analytical information from your data lake, your choice of stack is important. Your application represents a significant investment (especially if you want to get right), and whether built from scratch or tightly coupled with a vendor, there will be some challenges along the way.

One thing is certain: in a world where changes in vendor policy and the continued rise of better, more capable technologies have the potential to cause disruption, your investment needs to be secure. It’s always been best practice to separate your application from your underlying data source as much as possible - and that remains true today.

This sort of portability means flexibility, giving you power to leverage best of breed (read: most suitable for you) at any time, and safeguard the investment in your business applications against factors outside of your control.

One thing to remember: it’s tempting to think of the application layer and the user experience as being the icing on the cake, but don’t be fooled: it is the cake . Your application is the way that people access, seek and interact with your data, so make sure that it’s well thought out, and planned and budgeted for appropriately.


If you have questions or comments, or have a project of your own in mind we would love to talk to you about it. Please don’t hesitate to drop us a line, or to give us a call on +44 (0)1223 653 163 (UK), or (408) 678 – 0400 (North America).