8265989: System property for the native character encoding name #3777

naotoj · 2021-04-28T22:24:31Z

After some internal discussion, we thought it was good to expose the native environment's default character encoding, which Charset.defaultCharset() is currently based on. This way applications will have a better migration path after the JEP 400 is implemented, in which Charset.defaultCharset() will return UTF-8, but the value of this new system property will remain intact. A CSR has been filed with more detailed information.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8265989: System property for the native character encoding name

Reviewers

Iris Clark (@irisclark - Reviewer)
Joe Wang (@JoeWang-Java - Reviewer) ⚠️ Review applies to fa5246c
Roger Riggs (@RogerRiggs - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3777/head:pull/3777
$ git checkout pull/3777

Update a local copy of the PR:
$ git checkout pull/3777
$ git pull https://git.openjdk.java.net/jdk pull/3777/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3777

View PR using the GUI difftool:
$ git pr show -t 3777

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3777.diff

naotoj · 2021-04-28T22:24:44Z

/csr

bridgekeeper · 2021-04-28T22:27:23Z

👋 Welcome back naoto! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2021-04-28T22:28:17Z

@naotoj this pull request will not be integrated until the CSR request JDK-8266075 for issue JDK-8265989 has been approved.

openjdk · 2021-04-28T22:29:17Z

@naotoj The following label will be automatically applied to this pull request:

core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2021-04-28T22:32:58Z

Webrevs

JoeWang-Java · 2021-04-29T00:37:37Z

src/java.base/share/classes/java/lang/System.java

@@ -699,6 +699,9 @@ public static native void arraycopy(Object src,  int  srcPos,
     *     <td>User's home directory</td></tr>
     * <tr><th scope="row">{@systemProperty user.dir}</th>
     *     <td>User's current working directory</td></tr>
+     * <tr><th scope="row">{@systemProperty native.encoding}</th>
+     *     <td>Character encoding name derived from the host environment and/or
+     *     the user's settings. Setting this system property has no effect.</td></tr>


May be "This property is read-only" instead of "Setting this system property has no effect" to not confuse with "user's settings"?

I suspect that if setProperty("native.encoding", "foo") succeeds, then it will return "foo". I also believe that a later invocation of getProperty("native.encoding") will also return "foo". If that's the case, then I don't think that the "read-only" alternative phrasing is correct. To me, the alternative suggests that an error will be return if there is an attempt to set it or that the potential new value will be ignored. The "no effect" phrasing avoids this problem. I also suspect that the "no effect" phrasing was selected to align with the apiNote in lines 719-721.

Thanks, Joe and Iris. I agree with Iris and that's the reason I chose the description. System properties are inherently mutable. There are some "protected" ones, by that I mean a private copy is made just after initialization for VM use, which is not affected by later setProperty() calls. But I don't think this new property should be treated as such.

The value of native.encoding should be a static property; that is, read once at startup and later modification
does not change the behavior of the Java runtime.

Whereas the value of native.encoding is derived from the value of native variables and those native
values do not change while Java is running, I think the behavior of the Java runtime should stay the same.
Unless it is static, the Java runtime will need to read the property every time it is needed and behavior can
change from call to call and actions in one thread can affect other threads.
Is there a use case where the application needs to change the encoding for every use in the Java runtime
independently of the native values.
Such an application should be explicit about its charset requirements and use APIs to select them explicitly.

Ok, understandable, and having a statement to clearly say 'no effect' would be indeed helpful. While working on methods with a charset parameter some time ago, I remember reading some user discussions about unable to programmatically change the default encoding, a confusion mostly as the user attempted to set encoding, e.g. System.setProperty("file.encoding", encoding), it seems that "the property is set (meaning no error), but it did not have the desired effect". Then there was also a mention in some tutorial where file.encoding and sun.jnu.encoding were recognized as read-only. The problem or confusion was that it appeared there’s such an API and the operation was successful, but it really didn’t have the effect.

JoeWang-Java · 2021-04-29T00:41:45Z

test/jdk/java/lang/System/PropertyTest.java

@@ -80,6 +80,7 @@
                {"user.dir"},
                {"java.runtime.version"},
                {"java.runtime.name"},
+                {"native.encoding"},


Does this test differentiate between read-only and modifiable properties? "native.encoding" looks like it's explicitly overridable. Or do we need a test to verify it's not overridable?

AlanBateman · 2021-04-29T07:17:26Z

src/java.base/share/classes/jdk/internal/util/SystemProps.java

+        var nativeEncoding = ((raw.propDefault(Raw._file_encoding_NDX) == null)
+                ? raw.propDefault(Raw._sun_jnu_encoding_NDX)
+                : raw.propDefault(Raw._file_encoding_NDX));
+        put(props, "native.encoding", nativeEncoding);


Shouldn't native.encoding be biased toward sun.jnu.encoding rather than file.encoding? Or maybe you'll change it when preparing the changes for JEP 400?

native.encoding preserves the encoding that current Charset.defaultCharset() is returning, which is based on file.encoding. So I believe the current implementation is correct. If it is biased toward sun.jnu.encoding, it would be problematic especially on Windows where that represents system locale.

Okay for now but the value of file.encoding will change to "UTF-8" so will be different to native.encoding.

Yes. That will be covered by JEP400.

mcimadamore · 2021-04-29T14:08:36Z

Naive question: any reason as to why we're not providing a new static API method in Charset to return the platform encoder? This initially will return same thing as Charset.defaultEncoder - but as JEP 400 is delivered the two will diverge. Any reason as to why we don't want to expose the platform encoder in the API?

mcimadamore · 2021-04-29T14:12:29Z

Naive question: any reason as to why we're not providing a new static API method in Charset to return the platform encoder? This initially will return same thing as Charset.defaultEncoder - but as JEP 400 is delivered the two will diverge. Any reason as to why we don't want to expose the platform encoder in the API?

Ok, I see this is addressed in the CSR:

We converged on a system property out of concern that an API method for the native encoding would be confusing for many developers.

The problem with this approach is that I think clients that need the platform encoder will have to stash it somewhere in a static field (to prevent reading property and parsing value over and over). It might also be harder for javadoc (I'm thinking of some of the Panama API) to express a dependency on it.

AlanBateman · 2021-04-29T14:41:18Z

Naive question: any reason as to why we're not providing a new static API method in Charset to return the platform encoder? This initially will return same thing as Charset.defaultEncoder - but as JEP 400 is delivered the two will diverge. Any reason as to why we don't want to expose the platform encoder in the API?

Agree an API would be better. We've looked at it a few times but have concerns that it would be confusing to most developers, esp. if it were another static method on Charset. System and Runtime have been examined too. The proposal here doesn't preclude adding an API in the future, it's mostly a question on whether it is needed and where would it be defined.

naotoj · 2021-04-29T16:32:03Z

Unless there is some use case other than for migration, I would not introduce a static method as Alan mentioned.

RogerRiggs

To support the statement that changing the property has no effect.
Please add it to the jdk.internal.util.StaticProperties cached values and an internal access method.

Otherwise looks good.

naotoj · 2021-04-30T21:58:29Z

To support the statement that changing the property has no effect.
Please add it to the jdk.internal.util.StaticProperties cached values and an internal access method.

Thanks. Added.

openjdk · 2021-05-04T04:07:20Z

@naotoj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8265989: System property for the native character encoding name

Reviewed-by: iris, joehw, rriggs

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 169 new commits pushed to the master branch:

3544a9d: 8266391: Replace use of reflection in jdk.internal.platform.Metrics
020236c: 8264786: [macos] All Swing/AWT apps cause Allow Notifications prompt to appear when app is launched
45760d4: 8266320: (bf) ReadOnlyBufferException in heap buffer put(String,int,int) should not be conditional
ff65920: 8265491: Math Signum optimization for x86
55cc0af: 8266185: Shenandoah: Fix incorrect comment/assertion messages
880c138: 8265349: vmTestbase/../stress/compiler/deoptimize/Test.java fails with OOME due to CodeCache exhaustion.
001c514: 8265322: C2: Simplify control inputs for BarrierSetC2::obj_allocate
194bcec: 8265984: Concurrent GC: Some tests fail "assert(is_frame_safe(f)) failed: Frame must be safe"
1d9ea3a: 8266083: Shenandoah: Consolidate dedup/no dedup oop closures
80941f4: 8234446: Post-CMS workgroup hierarchy cleanup
... and 159 more: https://git.openjdk.java.net/jdk/compare/52f9d2297796e7157eeddb102a8896d7a7fad5d5...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

naotoj · 2021-05-04T17:29:12Z

/integrate

openjdk · 2021-05-04T17:30:35Z

@naotoj Since your change was applied there have been 180 commits pushed to the master branch:

8b37d48: 8255493: Support for pre-generated java.lang.invoke classes in CDS dynamic archive
770dfc1: 8265279: Remove unused RandomGeneratorFactory.all(Class category)
ee5bba0: 8265767: compiler/eliminateAutobox/TestIntBoxing.java crashes on arm32 after 8264649 in debug VMs
05e6017: 8265137: java.util.Random suddenly has new public methods nowhere documented
aa90df6: 8266187: Memory leak in appendBootClassPath()
b651904: 8266438: Compile::remove_useless_nodes does not remove opaque nodes
141cc2f: 8261527: Record page size used for underlying mapping in ReservedSpace
8e071c4: 8265784: [C2] Hoisting of DecodeN leaves MachTemp inputs behind
ce1bc9d: 8266432: ZGC: GC allocation stalls can trigger deadlocks
30ccd80: 8264950: Set opaque for JTooltip in config file of NimbusLookAndFeel
... and 170 more: https://git.openjdk.java.net/jdk/compare/52f9d2297796e7157eeddb102a8896d7a7fad5d5...master

Your commit was automatically rebased without conflicts.

Pushed as commit 4e96b31.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

8265989: System property for the native character encoding name

fa5246c

openjdk bot added rfr Pull request is ready for review csr Pull request needs approved CSR before integration labels Apr 28, 2021

openjdk bot added the core-libs core-libs-dev@openjdk.org label Apr 28, 2021

JoeWang-Java reviewed Apr 29, 2021

View reviewed changes

irisclark approved these changes Apr 29, 2021

View reviewed changes

AlanBateman reviewed Apr 29, 2021

View reviewed changes

JoeWang-Java approved these changes Apr 30, 2021

View reviewed changes

RogerRiggs reviewed Apr 30, 2021

View reviewed changes

Added internal getter for native.encoding

e76ff41

Fixed a typo.

01b2ffc

irisclark approved these changes May 3, 2021

View reviewed changes

RogerRiggs approved these changes May 3, 2021

View reviewed changes

openjdk bot removed the csr Pull request needs approved CSR before integration label May 4, 2021

openjdk bot added the ready Pull request is ready to be integrated label May 4, 2021

openjdk bot closed this May 4, 2021

openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated labels May 4, 2021

openjdk bot removed the rfr Pull request is ready for review label May 4, 2021

naotoj deleted the JDK-8265989 branch May 4, 2021 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8265989: System property for the native character encoding name #3777

8265989: System property for the native character encoding name #3777

Uh oh!

naotoj commented Apr 28, 2021 •

edited by openjdk bot

Loading

naotoj commented Apr 28, 2021

bridgekeeper bot commented Apr 28, 2021

openjdk bot commented Apr 28, 2021

openjdk bot commented Apr 28, 2021

mlbridge bot commented Apr 28, 2021 •

edited

Loading

JoeWang-Java Apr 29, 2021

irisclark Apr 29, 2021

naotoj Apr 29, 2021

RogerRiggs Apr 29, 2021

JoeWang-Java Apr 29, 2021

JoeWang-Java Apr 29, 2021

AlanBateman Apr 29, 2021

naotoj Apr 29, 2021

AlanBateman Apr 29, 2021

naotoj Apr 29, 2021

mcimadamore commented Apr 29, 2021

mcimadamore commented Apr 29, 2021 •

edited

Loading

AlanBateman commented Apr 29, 2021

naotoj commented Apr 29, 2021

RogerRiggs left a comment

naotoj commented Apr 30, 2021

openjdk bot commented May 4, 2021

naotoj commented May 4, 2021

openjdk bot commented May 4, 2021

8265989: System property for the native character encoding name #3777

8265989: System property for the native character encoding name #3777

Uh oh!

Conversation

naotoj commented Apr 28, 2021 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

naotoj commented Apr 28, 2021

bridgekeeper bot commented Apr 28, 2021

openjdk bot commented Apr 28, 2021

openjdk bot commented Apr 28, 2021

mlbridge bot commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcimadamore commented Apr 29, 2021

mcimadamore commented Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AlanBateman commented Apr 29, 2021

naotoj commented Apr 29, 2021

RogerRiggs left a comment

Choose a reason for hiding this comment

naotoj commented Apr 30, 2021

openjdk bot commented May 4, 2021

naotoj commented May 4, 2021

openjdk bot commented May 4, 2021

naotoj commented Apr 28, 2021 •

edited by openjdk bot

Loading

mlbridge bot commented Apr 28, 2021 •

edited

Loading

mcimadamore commented Apr 29, 2021 •

edited

Loading