Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-domain errors cause sitemapindex XML confusion #305

Open
NiklasBr opened this issue Jan 18, 2023 · 0 comments
Open

Multi-domain errors cause sitemapindex XML confusion #305

NiklasBr opened this issue Jan 18, 2023 · 0 comments
Labels

Comments

@NiklasBr
Copy link

NiklasBr commented Jan 18, 2023

PHP version(s) affected: 8.1.13

Package version(s) affected: 3.3.0

Description
With a Symfony 5.4-based application, multiple sites with separate domains share a /public directory. For example:

  • 1.example.com
  • 2.example.com
  • 3.example.com

For each of these sites we run the following command (manually or via cron)

bin/console presta:sitemaps:dump --section site_1 --base-url https://1.example.com/ var/tmp/sitemaps
bin/console presta:sitemaps:dump --section site_2 --base-url https://2.example.com/ var/tmp/sitemaps
bin/console presta:sitemaps:dump --section site_3 --base-url https://3.example.com/ var/tmp/sitemaps

Now, after the first command for --section site_1 has been completed the XML is updated as expected:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://1.example.com/sitemap.site_1.xml</loc>
    <lastmod>2023-01-18T16:20:49+01:00</lastmod>
  </sitemap>
</sitemapindex>

And then after the second command, for --section site_2, has completed, all domains change in the index XML file, the content of the urlset https://2.example.com/sitemap.site_2.xml is correct, it has the correct base URL:s for all locations. But the index XML changes all URL:s.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://2.example.com/sitemap.site_2.xml</loc>
    <lastmod>2023-01-18T16:23:22+01:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://2.example.com/sitemap.site_1.xml</loc>
    <lastmod>2023-01-18T16:20:49+01:00</lastmod>
  </sitemap>
</sitemapindex>

And then after the second command, for --section site_3, has completed, all domains change in the index XML file, the content of the urlset https://3.example.com/sitemap.site_3.xml is correct, it has the correct base URL:s for all locations. But the index XML changes all URL:s.

<?xml version="1.0" encoding="UTF-8"?><sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://3.example.com/sitemap.site_3.xml</loc>
    <lastmod>2023-01-18T16:27:28+01:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://3.example.com/sitemap.site_2.xml</loc>
    <lastmod>2023-01-18T16:23:22+01:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://3.example.com/sitemap.site_1.xml</loc>
    <lastmod>2023-01-18T16:20:49+01:00</lastmod>
  </sitemap>
</sitemapindex>

Now, to where the error occurs, when starting over with the commands, e.g. the next day to periodically regenerate the files, the new one gets added on top of the previous ones:

<?xml version="1.0" encoding="UTF-8"?><sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://1.example.com/sitemap.site_1.xml</loc>
    <lastmod>2023-01-18T16:33:48+01:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://1.example.com/sitemap.site_3.xml</loc>
    <lastmod>2023-01-18T16:27:28+01:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://1.example.com/sitemap.site_2.xml</loc>
    <lastmod>2023-01-18T16:23:22+01:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://1.example.com/sitemap.site_1.xml</loc>
    <lastmod>2023-01-18T16:20:49+01:00</lastmod>
  </sitemap>
</sitemapindex>

How to reproduce
I think the full description above should do it.

Possible Solution
Maybe tag each <sitemap> in the index XML with the specific section, such as <sitemap id="site_1"> instead and use that to identify whether or not to update/add to the file?

Additional Context
n/a

@NiklasBr NiklasBr added the bug label Jan 18, 2023
@NiklasBr NiklasBr changed the title Multi-domain errors cause root XML confusion Multi-domain errors cause sitemapindex XML confusion Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant