How we handle data at Wuha

It's normal that you should ask what we do with your data. Data management and handling is a sensitive subject in terms of security and company ethics, and it's your right to know how Wuha works. We handle large amounts of data, particularly data from your company, and this document will explain how we handle your data, what we store, and why. This document does not talk about how we handle personal information regarding GDPR. To learn more about the privacy of your data, please read our privacy policy. Remember: your data is your property. We take only the data necessary to provide you the best service, and you always have the right to retrieve and delete your data from our systems.

What data does Wuha collect?

The Wuha search application connects a large number of your applications and, as a consequence, collects a large amount of data. The objective of this collection is to understand your professional information in order to provide the best search results. Thus, Wuha needs to have access to the data contained in these applications.
Different data is collected depending on type of application connected. Wuha can be connected to:

  • Document Storage applications (Google Drive, Dropbox, Box, your PC) allow you to search for documents based on their:
    • name and textual content
    • creation and modification dates
    • format (PDF, Word, PowerPoint)
    • authors, editors, and contributors
  • Communication applications (Outlook, Gmail, Slack) allow you to search for messages and attachments based on their:
    • subject, content of the message, or content of the attachment
    • format of the attachment (PDF, Word, PowerPoint)
    • date that the message or email was sent
    • by the emaill address of the sender or other recipients

You have the control!

No need to panic, your data is secure and you keep full control. Through Wuha, you can:

  • disconnect an application from Wuha: the association is deleted and all data removed from our servers
  • delete a document, an email, or another piece of information from the application. The data is immediately deleted from Wuha.
  • Download your personal data via My Account > Confidentiality > Download your data
  • Delete your data. Besides disconnecting an application, you can also ask us to delete all of your data.
  • Permanently delete your account, thus deleting all data in the Wuha system

Wuha is not a storage system. We do not try to replicate the access rights that you have already configured in your applications. When we collect your data, we also collect who has access to that data. If access within your system changes, the Wuha system updates its access lists. Only users who have access to a document will find that document in their Wuha searches.

Your data gives the best search results

Your data is encrypted during transit and allows our data scientists to work on giving you the best search results available. Our team work using techniques in NLP (Natural Language Processing) to understand both your data and your search queries. For us, NLP is a pipeline - it consumes your data as input and uses our models to extract and structure the content. When you run a search, we apply the same pipeline and match the results.
Here is a general outline of the steps we take to structure and understand your data:

  1. Collection of the necessary data from the connected application (e.g. Google Drive)
  2. Extraction of text contents and metadata of documents. We identify several characteristics of your information, including:
    • the language of the document
    • the type of document (invoice, CV, purchase order form) using our custom classification models
    • whether the document is similar to other documents. Are there multiple versions of the document? Which is the most relevant?
  3. Indexing and enrichment of the extracted content. We index the documents in an ElasticSearch cluster, and enrich them by:
    • Removing common words such as "the", "and", "they"
    • Removing plurals, possession and conjugations
    • Extracting named entities such as people, places and dates
    • Understanding commonly used expressions and phrases
  4. Understanding your query. Once we've indexed your data, we just need to know what you're searching for. We use similar techniques to document extraction to match your query against your data set including extracting person names, places, and dates. Once we understand your query, it's a case of matching it to your documents.

Finally, Wuha learns what information is important to you and adapts to your behaviour. If you spend a lot of time searching for emails, we'll start suggesting emails as better results. The more your use Wuha, the more benefit it brings.