documentation

    The life of a data at Wuha

    You are wondering how we handle your data, and this is more than understandable.... First of all, because it is a sensitive subject both in terms of safety and ethics. Then because we are working on millions of data, and in particular yours. We will not refer here to information relating to your personal data and in accordance with the General Regulations on the Protection of Personal Data (GPDR) which you can consult in our privacy policy. However, we will explain how we proceed to provide you with the best research experience! And for that, you will discover how our Artificial Intelligence works. In addition, please note that we do not take your data hostage: you have the possibility to retrieve and delete them permanently.

    What does Wuha recover?

    The solution proposed by Wuha allows to connect a certain number of applications, and consequently, to collect a significant amount of data. The objective being to allow you to get the best information from various sources, Wuha must therefore have access to the data contained in these applications.

    The type of data recovered when connecting an application is inherent to the very nature of that application. Connect Wuha to :

    • Storage applications (Google Drive, Dropbox, Box, Computer...), will give you easy access to your documents by searching through their:

      • name and content
      • date of creation or modification
      • format (pdf, docx, pptx...)
      • authors & contributors
    • Email applications (Microsoft Outlook, Gmail, Slack....) will allow you to find the most relevant emails and attachments by searching with:

      • the subject, label or content of the email and attachment
      • in the format of the attachment (pdf, docx, pptx...)
      • the date on which the email was sent or received
      • to the email address of the sender, recipient or other contacts in the email exchange chain

    Your data for the best results

    Your data is encrypted securely and allows our Data Scientists to offer you the best possible results among the phenomenal amount of data at your disposal! All the work of our team is based on the understanding of natural language. NLP (Natural Language Processing) allows you to link your requests to the content of a document. To do this, the NLP is articulated in a "pipeline": to absorb the complexity of our Machine Learning model, we divide each request into a series of several simpler processes.

    In all transparency, here are the steps in question on which our AI (Artificial Intelligence) is based:

    1. Exploration of the application you are connecting (e.g. Google Drive)
    2. Raw extraction of textual data and document enrichment. This step allows, among other things, to identify:

      • the language of the document,
      • the type of document (invoice, resume, purchase order, etc.) using a classification algorithm driven by supervised learning techniques.
      • similarity with other documents. This allows us to group these documents together for display
    3. Cleaning and enrichment: the data is sent to the Elasticsearch cluster, which is responsible for :

      • Standardize data to optimize search

        • distribution of data in the corresponding language index
        • deletion of "Stop Words" (the, it, we, me)
        • lemmatization: withdrawal of plurals and genders (male/female)
        • quotes, accents are removed
        • change to lowercase
      • Enrich the data:

        • detection of dates and last/first names
        • automatic search for synonyms and acronyms present in the text.
        • identification of nominal groups
    4. Processing your request: Our system, developed by our team of Data Scientists, uses Deep Learning techniques to perform NER (Named Entity Recognition) on requests in a fraction of a second. By testing these techniques, we can identify whether this request concerns a person, a date, a place, a nominal group or finally a file extension.

    The algorithms articulated in our NLP pipeline extract the best proposals from each connected source. A final treatment by our AI allows us to give you relevant results: according to your experience, Wuha will submit you the most appropriate documents.

    By continuing to browse this site, you agree to the use of Cookies to collect statistics on visits. Read more