-
-
Notifications
You must be signed in to change notification settings - Fork 83
Fix Sphinx copyright year substitution #312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Sphinx copyright year substitution #312
Conversation
Using a Unicode en dash to separate years in the Sphinx `copyright` option results in unreproducible builds (in the sense of https://reproducible-builds.org/), because Sphinx's special handling of `SOURCE_DATE_EPOCH` only works when the separator is an ASCII hyphen. The Sphinx developers seem to have considered changing this and then decided against it (see sphinx-doc/sphinx#12738), so it seems best to adjust the configuration here.
@cjwatson I don't understand what problem this solves. Somebody made up some "best practice" about formatting copyright dates in documentation? If someone has a workflow based on parsing the copyright notice of the documentation, they should be brought to an awareness about the fragility of that assumption. Git it really good at marking and time-stamping changes to a collection of files, and tagged releases and/or pushes to PyPI should be used to decide changes in releases and their dates. If we are touching the copyright notice for the docs, then @jagerber48 and I should be added as authors, or perhaps just change that to "Uncertainties Developers". I vote to close. |
Sorry for apparently not explaining this very clearly. There's an effort called "reproducible builds" which is trying to improve the situation with users being able to verify that binary packages they download match the corresponding source code. A key part of that is being able to build the same source code at different times (though with the same toolchains, dependencies, and so on) and get exactly the same results. https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/uncertainties.html currently shows that when we tried to build the Debian package of uncertainties with the system clock artificially set to different times, we found differences in the results which amounted to the following in various places: - © Copyright 2010–2025, Eric O. LEBIGOT (EOL).
+ © Copyright 2010–2026, Eric O. LEBIGOT (EOL). The current Sphinx configuration for uncertainties sets the Sphinx has included support for Based on all the above, I'm waiting for confirmation from Debian's reproducible-builds testing machinery that adjusting the punctuation here fixes build reproducibility across year boundaries, but I think it should. If you think it's wrong to pick up the end date from the system clock, then changing to a fixed end date rather than Of course I'm not in a position to express any view on the question of who should be listed in the |
@cjwatson Why would a "reproducible build" process which clearly states (100% I know what this thing is):
be concerned about the copyright notice for the documentation? I'm sort of still leaning towards "Nope". Anyway, the whole "reproducible build" thing is inherently uninteresting for this project, as it is pure Python. There is no path from source code to binary -- there is nothing to verify. |
It is not in general sustainable or realistic to do reproducibility checks with a pile of exceptions for things that might be different, not least because small differences have a way of embedding themselves into nested file formats. We check reproducibility by confirming that the build outputs (which in this case includes the generated documentation) are bit-for-bit identical. By the same token, it's not sensible to have exceptions for packages that ostensibly have trivial build procedures. Sure, the process of going from the source tree for uncertainties to a binary output might be trivial. That does not prevent somebody from injecting code into a build machine (whether that's for wheels, .debs, RPMs, or whatever) that includes some kind of stealthy modification in the build outputs. Reproducible builds give us a way to detect that kind of attack, even if it's only after the fact, by running independent builds and comparing the outputs. And doing that is a lot more effective when the outputs can actually be reliably compared, without having to maintain a complex set of known exclusions. (This is just one package among many.) |
I'm a bit puzzled by this also but changing a single character imperceptibly seems low cost.. I just need to learn more about what the issue is. For example, if the issue is originating with @cjwatson how did this issue come to your attention for this package? Are you doing some scrape of many popular |
I was also puzzled at first because why does a Sphinx configuration affect what goes into the Python package, but looking here I see that debian actually packages the Uncertainties documentation as a separate package. I can also see there that @cjwatson helps maintain it. Personally, I don't mind changing which unicode character is used for the hyphen in the Sphinx copyright to the form that some part of Sphinx already assumes. |
A packaging system should have a copy of the source kit, and store some hash/checksum key of that kit to prevent alteration of the source code. It can build and install from that source kit any way it wants, including making any patches it wants. Why should we have to change our code to match the assumptions of some packaging system? |
Yeah so this is the problem. It's a bit insane that something depends on the exact punctuation and then repos are being asked to make these punctuation changes. One part of me says this is a minor change that we can make to make someone's life easier. But another part of me is tempted to reject this so that the right parties (sphinx and reproducible-builds) figure out a better resolution to eliminate the punctuation sensitivity. |
It sounds like right now the punctuation sensitivity in sphinx is retained so as not to break packages that are deliberately working around this in some way. But I guess now other package, who don't really care about this, are being asked to make changes. |
@jagerber48 While I don't have figures for everything using Sphinx anywhere, https://tests.reproducible-builds.org/debian/issues/unstable/copyright_year_in_documentation_generated_by_sphinx_issue.html suggests that this is a pretty rare issue at least in the set of packages in Debian. In fact, although I haven't looked very deeply, the other package listed there seems to be unreproducible for some other reason and not this one; so from what I can tell there is one significant package affected by this problem, certainly not thousands or probably even very many. I noticed this because I'm on the Debian Python packaging team, which deals with the bulk of the Python modules that have been packaged in Debian for one reason or another (in this case probably because it's an indirect dependency of a few applications that somebody wanted to make available in our distribution). There are a lot of these and we generally rely on dashboards to keep things manageable. One of the things on our dashboard is whether packages build reproducibly; the great majority do, but a few such as this one don't. @wshanks Yes, we often package documentation for various reasons. (Though I wasn't the one who set that up in this case; I've just been keeping the package up to date.) @newville We can and do apply patches, of course; in fact I already applied this patch to our package, so this PR isn't blocking me. Most upstream maintainers prefer it if we send patches upstream wherever possible, though; and often similar problems affect more than one downstream distributor. I don't want this to be an adversarial business. It's obviously up to you whether you take this patch; I can't really spend ages trying to persuade the maintainers of one package to take a one-character patch when we have thousands of packages to deal with! But it does seem as though perhaps the |
This was my first reaction. I did not actually know en dash existed, only em dash and hyphen. Looking up en dash, I see that one of its main uses is to separate dates, so I am less inclined to switch to hyphen, though I will personally just keep using hyphen rather than en dash in my own typing. It seems like Sphinx should reconsider its behavior and offer an explicit opt out rather than behaving differently for hyphen and en dash. I could see an argument for hard-coding the date. It is odd to think of someone else re-running the Sphinx code years from now and extending the copyright date. It is a chore to update it manually though. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #312 +/- ##
=======================================
Coverage 86.54% 86.54%
=======================================
Files 18 18
Lines 1903 1903
=======================================
Hits 1647 1647
Misses 256 256
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CodSpeed Performance ReportMerging #312 will not alter performanceComparing Summary
|
That's my thoughts too. This change is practically unnoticable.
Absolutely. I wonder how copyright dates are handled in other larger packages. I think we should either
|
I vote +1 to merge this with some minor changes. @cjwatson has taken time clearly spelling out the case for this minor change. I don't feel a strong need to "send a message" with our response to this PR. I would appreciate a minor message in the changelog pointing out how this is in response to a quirky behavior in sphinx. We can refer to sphinx-doc/sphinx#12738 in the changelog even. I think the discussion in this PR should be referenced also. @cjwatson if you don't want to do that then we can update the changelog ourselves when we make the next release, but the details are fresh in everyone's heads now so it will be easier. If need be, we can clean up the formatting etc. ourselves to conform to our loose conventions at release time. But, I do think the sphinx behavior is essentially a bug and I'm frustrated that we had to have this conversation here and spend time/energy so that other packages can somehow rely on this buggy behavior as "backwards compatible" feature. So I would appreciate if an issue was opened with sphinx re-visiting sphinx-doc/sphinx#12738. I would be happy to put a comment on such an issue expressing this point. |
Maybe the best thing to do first is to loop in @jayaddison on this to see if I've missed anything in my analysis, or if they think it's likely that Sphinx might want to change anything about its handling of copyright years in light of this PR. |
Hello - Sphinx is doing its best to enable reproducibility for documentation projects with programmatic copyright notices; possible, as here, because We've been aware of this limitation/problem, and from Sphinx v8.1.0 onwards there is a (note regarding sphinx-doc/sphinx#12738: I don't have any magic tooling to assess the impact of changes in Sphinx itself on downstream projects (some of which will be closed-source) -- but the reason I pushed back on that was that I'm aware of at least one (open-source) project that intentionally did the opposite - they began using Edit: en-dash, not em-dash; thanks, @wshanks. |
Mea culpa: I've tested my own recommendation, and using the I'm sorry about that; I'd forgotten how these features related to each other. |
At the risk of being spammy: the change that I had hoped to suggest, before finding that it does not work, was: index 28dadeb..5412ee4 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -44,7 +44,7 @@ master_doc = "index"
# General information about the project.
project = "uncertainties"
-copyright = f"2010–{date.today().year}, Eric O. LEBIGOT (EOL)"
+copyright = f"2010–%Y, Eric O. LEBIGOT (EOL)"
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the I have opened a feature/change suggestion thread to make such a change possible, and would welcome any thoughts/support/criticism, at: https://github.com/orgs/sphinx-doc/discussions/13526 |
I am still -1 -- willing to be out-voted. The current code is not broken in a way that we see in our tests or usage. This is a patch for their code and assumptions, not ours. There are no guarantees that we won't "break" this at some point in the future. We are not obligated to support others building the docs or guarantee that building the docs will yield byte-for-byte compatible results. As an example: the lmfit-py documentation uses the Sphinx gallery package to generate results that includes the run time to milliseconds and auto-generated images (see https://lmfit.github.io/lmfit-py/examples/example_fit_with_inequality.html#sphx-glr-examples-example-fit-with-inequality-py for example). It is not byte-level reproducible. The assumption that rendered docs must be byte-level reproducible is flawed. Again, willing to be out-voted. |
But this seems odd. Having code reliant on whether a certain character is an en-dash or a hyphen is very fragile and it seems a bit crazy to have or support any code that is reliant in that way. I don't fully understand the
This is a good point. I fully agree with this. Especially
Maybe I haven't yet been burned enough by other people's demands on the codebase I maintain, but for this one I'm willing to just pass it through to make them happy. Obviously if another reproducible build issue comes up I would be less willing to comply for tenuous justification... |
The Wikipedia entry for en dash lists connecting ranges as the first use case for en dash, so the current usage is valid. On the other hand, even the Wikipedia entry notes that some style guides favor hyphen over en dash:
So changing it is not a complete violation of punctuation convention. I agree though that requiring this kind of minor change does not seem like the right way to achieve something like reproducible builds because it does not scale well , but I know reproducible builds is a long-running effort and probably anything I could come up with on the fly has already been considered. It seems like we have a -1 vote and a +1 vote and then two ambivalent votes leaning towards +1, so we will probably merge this. If it seemed like we would vote it down, I would suggest we make the change to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Let's just clear this out. Will merge in the next couple days unless someone else does.
Using a Unicode en dash to separate years in the Sphinx
copyright
option results in unreproducible builds (in the sense of https://reproducible-builds.org/), because Sphinx's special handling ofSOURCE_DATE_EPOCH
only works when the separator is an ASCII hyphen. The Sphinx developers seem to have considered changing this and then decided against it (seesphinx-doc/sphinx#12738), so it seems best to adjust the configuration here.
pre-commit run --all-files
with no errors