How to implement Qliksense - Text Analysis - Qlik Community

Venkatesh_5990

Hi Friends,

I got a requirement that is Categories the types of Incidents from ServiceNow data.

Our source data from ServiceNow, every day we will load data from ServiceNow to Qlik sense. Actually the data flow is ServiceNow >> Gsheet >> Qliksense.

Actually the requirement is in the below:

Please find the below sample incident data which was raised by users with different short descriptions.

Incident_number	Short Description	Created By	Open Date	Due Date	State
INC001	Unable to access XYZ stream, please provide access	ABC	3/27/2024	4/1/2024	Open
INC002	Got error message like 'Access Denied' while accessing the ABC dashboard URL link	CDE	3/26/2024	4/1/2024	Open
INC003	I am unable to access the folder	FGH	3/25/2024	4/1/2024	Open
INC004	I am unable to sign in Workday portal	IJK	3/24/2024	4/1/2024	Open
INC005	Can you grand me access to the database	LMN	3/23/2024	4/1/2024	Open
INC006	Batch job failed to run	OPQ	3/22/2024	4/1/2024	Open
INC007	XYZ Dashboard qmc task aborted	RST	3/21/2024	4/1/2024	Open
INC008	Failed - Informatica xyz_inflow task incomplete	UVW	3/20/2024	4/1/2024	Open

Here row number from 1 to 5 should fall under "Access Related Issue" category and row number from 6 to 8 should fall under "Batch Job Related Issue".

Category	Count of Incidents
Access Related Issue	5
Batch Job Related Issue	3

How can we achieve this. Can you please help me this.

Appreciate your help and thanks in advance.

F_B

This is a very hard task without using a flag or a default list of descriptions.

Venkatesh_5990

Thanks @F_B for your comment.

Yes it is hard task. I tried with wildmatch function with hardcode a specific words but clients are not accepting. They are approaching ML and Python way to implement this task.

I am not an expert in ML and even Qlik also so want your all help for this post.

Thank you.

BrunPierre

Indeed. Perhaps you can use SubStringCount() or WildMatch() to identify specifics in the text.

Venkatesh_5990

Actually clients requested to don't use hardcode anything in the dashboard. Why because please see the below example you can understand the reason of it

Ex. If the user creating an incident regarding about server issues. Accidentally user mentioned the short description like "serrver" instead of "server".

But in our wildmatch function and Subfield function we wrote like "server" only then this incident won't be fall under "Server Related Issue" Category right.

I hope you can understand.

marcus_sommer

The difficulties are mainly not related to the used tool else to define the appropriate rules and logic which are suitable to the scenario. Implementing it in any tool will have their challenges and efforts.

If I had this task I would probably do it in Qlik with the following measurements:

splitting the text in single words with something like: subfield(Text, ' ', iterno()) within a while-loop
classifying the content into verbs, adjectives, expletive and so on
filtering against the word-types
filtering against excluding-list
cleaning spaces/quotes/commas and so on
correcting typos per prepared mapping-lists
setting everything to upper/lower
mapping against target-words (detecting them as well as returning/linking the cluster-values and also prioritizing them)
creating a scoring-value against the matches
concat() of all matches

whereby the real order may differ to the provided listing and also be combined in some way and not always applied sequentially.

Like hinted above the main-work is not to apply the various measurement in one or two dozens statements else defining all the stuff. Getting a proper categorizing for 30% to 50% will be surely quite easy, the following 10% - to 20% will need more efforts and the costs to get 90% to 98% will definitely increase exponentially and you won't never reach 100%.

Much simpler as the above would be to include some mandatory fields within the incident data with pre-defined values in a dropdown.

Venkatesh_5990

Hi @marcus_sommer ,

Thanks for your update. But here you are using hardcode the text right.

Can we approach different way without hardcoding the text. Is it possible ?

marcus_sommer

Of course, it was meant without hard-coding the most things else using external maintained lists and rules - maybe within Excel - and loading them as mappings against the text/words.

The company-definitions are specific and would need to be done from your business-departments but the other things like the mentioned classifying/cleaning/preparing could happens against (commercial) dictionaries and rules respectively derived programming logic.

Just a further hint - instead of diving very deep in the cleaning-measurements by looking and fixing double-chars within the words or correcting very common mistakes by chars like p and b / t and d and so on you may also implement the LevenshteinDist - script and chart function | Qlik Cloud Help to check the similarity between the words.

How to implement Qliksense - Text Analysis

Developers

Qlik Sense

Visualization