Re: Considerations of having multiple Replicate se... - Qlik Community

MoeyE · ‎2024-02-27

Hi team,

I need some guidance. What should be considered when there is 1 Replicate server with almost 100 tasks reading from the same source endpoint. Then another server will be added to read from the same source endpoint.

My current thoughts:

* Estimated number of tasks on the new server

* Estimated number and size of tables on new server

* Any LOBs in the tables

* Is there currently any source latency that indicates the source server is already under too much stress

What considerations am I missing? Thanks.

Regards,

Mohammed

aarun_arasu · ‎2024-02-27

Hello @MoeyE ,

Thanks for reaching out to Qlik community.

I would recommend you to consider "logstream task" if you have multiple tasks reading from same source.

Please refer to the below user guide

https://help.qlik.com/en-US/replicate/November2023/Content/Replicate/Main/Log%20Stream%20Staging/int...

Regards

Arun

aarun_arasu · ‎2024-02-27

Hello Team,

If our response has been helpful, please consider clicking "Accept as Solution". This will assist other users in easily finding the answer.

Regards,
Arun

Heinvandenheuvel · ‎2024-02-27

Let's take a step back... WHY do you think you should have 100 tasks reading from the same source endpoint?

IMHO The only valid reason is many different target endpoints.

Are you using Logstream already? You should.

With logstream in place, what is the indication that you might need a second Replicate server?

There is nothing wrong in using a second server, but you should have a solid reason.

Examples of bad reasoning:

- Our tasks are not allowed to have more that NN tables each.

- Our tasks are not allowed to mix source schemas because those represent different customers which must be kept separate. Yeah - no!

MoeyE · ‎2024-02-27

Hi Hein,

Thanks for the answer. Yes the reason is to reduce weight/stress on the current Replicate server. There is a new project which I believe will be large so that's why the need of more servers has appeared. Also the plan is to a new target (Snowflake) to the new server. Yep, also logstream is a no-brainer, I'll ensure that this is configured efficiently.

Regards,

Mohammed

Heinvandenheuvel · ‎2024-02-28

Thanks for the clarification Mohammed.

But what I do NOT see is WHY you have hundreds of (CDC?) task planned for a single source endpoint.

This question is in the context of CDC tasks right? For full-load multiple tasks may well server scheduling purposes.

Some folks come up with a silly arbitrarily rule that no CDC task shall have more then 100 tables and wiith 5000 tables think they need 50 tasks. Nonsense! They need 1, maybe 5 tasks, but no more and will reduce the Replicate server overhead by a factor of 10 just by NOT having to re-read the change log (logstream or direct) over and over.

What I also do NOT see is an indication of having multiple target endpoints for which multiple tasks would be unavoidable. I only hear about maybe 1 more Snowflake target, not dozens. So why not just 1 tasks for everything you had and 1 more for snowflake?

Push your customer for good, concise answer before going for a knee-jerk just add a server 'solution' (workaround!)

DesmondWOO · ‎2024-02-28

Hi @MoeyE ,

The execution of 100 CDC tasks will establish 100 connections for reading the transaction log. If these tasks are all directed towards the same database server, it could result in significant server load.

Regards,
Desmond

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

MoeyE · ‎2024-02-29

Hi @DesmondWOO,

Yep this makes sense. Please help my understanding. Is the main concern with too many connections from Replicate? or is it with it with too many tasks reading from one transaction log right?

It's both right? Too many connections and too many reads to the same database's transaction logs are both causes of overhead?

Regards,

Mohammed

Heinvandenheuvel · ‎2024-02-29

@MoeyE >>> yes : "It's both right? Too many connections and too many reads to the same database's transaction logs are both causes of overhead?"

Too many connection all reading and interpreting the same transaction log will cause too much overhead on the source server to deliver the data and too much overhead on the Replicate server to interpret that data over and over.

MoeyE · ‎2024-02-29

Hi Hein,

Thanks. In the theoretical scenario where a server has 10 different databases and there are 10 different logstream staging tasks each reading one of the databases. There are 10 connections to the server, but 1 connection to each database so my understanding is that not much overhead is created in this scenario.

So mainly the issue is too many connections on the same 1 database. Not too many connections on the server which contains these databases. Thanks for the help, it's truly appreciated.

Regards,

Mohammed

Considerations of having multiple Replicate servers reading the same source

Configuration

General Question