Currently, linkvalidator does link checking like this:
- Either in scheduler or in link check tab, you can start link checking by defining a startpage and a depth
- Linkvalidator first collects all pages (walks up to depth starting from startpage, hidden pages can be ignored)
- It then deletes all broken links already detected for these pages from list of broken links tx_linkvalidator_links
- It then starts checking the records and fills tx_linkvalidator_links with the new found broken links
This has several disadvantages:
- The deletion of all broken links at the beginning - if someone else works on broken links, while they are being checked, they are suddenly all deleted and crawling again may take hours
- we check again and again - even if no records are changed
- even though we may be checking a lot, there may be still broken links undetected or broken links shown that no longer exist
The current mechanism is not well suited for large sites and for sites where several people may work on broken links simultaneously.
The current mechanism may check a lot but still the information won’t be up to date.
You can start checking at any level (either in scheduler or in “Check links” tab) which may result in the entire pagetree being checked 7 days ago and a small subtree checked 2 hours ago. This means the “last check time” may be different. There is no way for the user to know if he / she is looking for an up-to-date list of broken links.
What would be the best solution?
- Add incremental checking - only changed records are checked, see https://review.typo3.org/c/Packages/TYPO3.CMS/+/65622 - will check less, but some problems still remain
- Directly check when records are changed, e.g. via add, delete, edit events - external links should not get checked synchronously - this may take long!!!
- When records are changed, use a link check queue and add a link check request to the queue. There can be a scheduler task to work on queue
4 … ?