Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Traditionally, Linux (and Unix) filesystems have always considered file names as an opaque byte sequence without any special meaning, requiring users to submit the exact match of the file to find it in the filesystem. But that is not how humans operate. When people write titles, 'important report.ods' and 'IMPORTANT REPORT.ods' usually mean the same piece of data, and you don't care how it was written when creating it. We care about the content and the semantics of the words IMPORTANT and REPORT.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 27, 2020 23:05 UTC (Thu)
by donbarry (guest, #10485)
[Link] (10 responses)
Posted Aug 27, 2020 23:05 UTC (Thu) by donbarry (guest, #10485) [Link] (10 responses)
The place for this is not in a filesystem, it's in higher-level interfaces to it. A filesystem needs to be rigorous in minimizing "gotchas", because it has many layers depending on it. Monkeying around with semantics is better done with these higher layers who have a far smaller list of software which can be broken by their changes and can evolve with it.
The long shadow of Windows and its choices continue to haunt.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 6:46 UTC (Fri)
by warrax (subscriber, #103205)
[Link] (6 responses)
Posted Aug 28, 2020 6:46 UTC (Fri) by warrax (subscriber, #103205) [Link] (6 responses)
The only way it could possibly work if it was as 'embedded in the fabric of everything *NIX' as e.g. libc is (or via POSIX mandate, perhaps?), but that's not going to happen. Plus, you still need to track different encodings/casing rules for different file systems, e.g USB sticks, so it needs to exist in the data *somehow*... and a file system seems about right for that, practically. Ideally, you'd do it per filename, but that's probably impractical considering everything on *NIX treats filenames as bytestrings. So here we are.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 7:05 UTC (Sat)
by gfernandes (subscriber, #119910)
[Link] (5 responses)
Posted Aug 29, 2020 7:05 UTC (Sat) by gfernandes (subscriber, #119910) [Link] (5 responses)
It would seem to me this is a totally spurious development. Gnome 3 already searches for files ignoring case. You can simply type in the Windows key, and start typing the file name - et voilà! Your file is one of the results.
So why does anyone need this?
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 19:52 UTC (Sat)
by t-v (guest, #112111)
[Link] (4 responses)
Posted Aug 29, 2020 19:52 UTC (Sat) by t-v (guest, #112111) [Link] (4 responses)
Because the filesystem defines the mapping of names to files.
Suggesting that it should be handled elsewhere implies that some tools then will interpret filenames differently to others. It also means that you can have two files with filenames that look distinct to some tools and not distinct to others.
I can see how it is a complex feature and not everyone wants it for everything (and there is good reason it's optional, right?), but the kernel (filesystem or vfs or whatever) certainly seems like the natural place to put this abstraction.
In the end, as long as realpath canonicalizes paths, it would seem to be not even breaking that many guarantees you rely on.
I'm a bit surprised people can get all worked up about this feature.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 20:23 UTC (Sat)
by gfernandes (subscriber, #119910)
[Link] (3 responses)
Posted Aug 29, 2020 20:23 UTC (Sat) by gfernandes (subscriber, #119910) [Link] (3 responses)
I'd think that's some pretty good reason to **not** confuse the matter by making names case insensitive?
If **you** are confused with how you name files, use Gnome3, set up a bash alias for find - several ways to deal with that.
Making the filesystem case insensitive seems a bit unnecessary.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 19:45 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (2 responses)
Posted Aug 30, 2020 19:45 UTC (Sun) by Wol (subscriber, #4433) [Link] (2 responses)
And you have NO GUARANTEES WHATSOEVER that other apps won't tamper with filesystem directory structure behind your back, invalidating your map in the process ...
Isn't mapping filenames to files one of the main jobs for a filesystem? ALL filesystems enforce a "set of valid characters" rule - even *nix! Why *shouldn't* a filesystem declare a canonical list? Just say that the canonical name can't contain eg upper case, and then allow aliases that are stored in the same directory entry eg "what the user entered" as opposed to "what the filesystem transmogrified it to".
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 4, 2020 5:30 UTC (Fri)
by gfernandes (subscriber, #119910)
[Link] (1 responses)
Posted Sep 4, 2020 5:30 UTC (Fri) by gfernandes (subscriber, #119910) [Link] (1 responses)
I don't think I said that.
What I _did_ say is that Gnome _indexes_ your files and _allows_ searching in a car insensitive manner.
So why the song and dance when it's a non feature?
Not that it affects me in the least - I've been on btrfs for quite some time now.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 4, 2020 5:31 UTC (Fri)
by gfernandes (subscriber, #119910)
[Link]
Posted Sep 4, 2020 5:31 UTC (Fri) by gfernandes (subscriber, #119910) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 8:50 UTC (Fri)
by oldtomas (guest, #72579)
[Link]
Posted Aug 28, 2020 8:50 UTC (Fri) by oldtomas (guest, #72579) [Link]
Now, what's the fraction of Unicode which has a notion of "case"? What's the fraction of humanity whose native language has? (the second might be one or two orders higher, still it's probably less than 0.5).
Still: does that justify Rube-Goldberging that mess into a kernel? IMHO: no.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 6, 2020 13:34 UTC (Sun)
by jond (subscriber, #37669)
[Link] (1 responses)
Posted Sep 6, 2020 13:34 UTC (Sun) by jond (subscriber, #37669) [Link] (1 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 6, 2020 20:28 UTC (Sun)
by zlynx (guest, #2285)
[Link]
Posted Sep 6, 2020 20:28 UTC (Sun) by zlynx (guest, #2285) [Link]
From the amount of capitalization mistakes and outright spelling errors in things like Bash shell scripts I'm convinced they semi-randomly hammered the keyboard then relied on IDE support and auto complete for everything when writing code.
If they hadn't been forced to deploy on Linux servers who knows how bad it would have become.
color me sceptical
Posted Aug 27, 2020 23:31 UTC (Thu)
by gus3 (guest, #61103)
[Link] (3 responses)
Posted Aug 27, 2020 23:31 UTC (Thu) by gus3 (guest, #61103) [Link] (3 responses)
It's just another question to be answered as the semantics are clarified.
color me sceptical
Posted Aug 27, 2020 23:38 UTC (Thu)
by krisman (subscriber, #102057)
[Link] (1 responses)
Posted Aug 27, 2020 23:38 UTC (Thu) by krisman (subscriber, #102057) [Link] (1 responses)
> originally named "Floss" be looked up using the name "Floß"? I'm not so sure.
The article is more of a higher level overview and the floß serves to exemplify what we mean by the complexity of non-english languages, I didn't mean to show the strict semantics with that one :)
If you check documentation it will show we use Unicode's canonical decomposition for normalization (NFD) with small modifications, documented in ./admin-guide/ext4.rst
color me sceptical
Posted Aug 28, 2020 0:25 UTC (Fri)
by gus3 (guest, #61103)
[Link]
Posted Aug 28, 2020 0:25 UTC (Fri) by gus3 (guest, #61103) [Link]
Thank you for your quick reply!
color me sceptical
Posted Sep 1, 2020 17:15 UTC (Tue)
by nilsmeyer (guest, #122604)
[Link]
Posted Sep 1, 2020 17:15 UTC (Tue) by nilsmeyer (guest, #122604) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 27, 2020 23:55 UTC (Thu)
by dullfire (guest, #111432)
[Link] (63 responses)
Posted Aug 27, 2020 23:55 UTC (Thu) by dullfire (guest, #111432) [Link] (63 responses)
If it's not a cli I can't see how it matters. GUI software will display the correct things, and if there's a search, it can default to case-insensitive (if that's a sane default for the expected user base).
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 5:03 UTC (Fri)
by xanni (subscriber, #361)
[Link] (4 responses)
Posted Aug 28, 2020 5:03 UTC (Fri) by xanni (subscriber, #361) [Link] (4 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 18:34 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
Posted Aug 29, 2020 18:34 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (3 responses)
A number of people on (mostly) Windows have figured out that the most effective way to mod Skyrim is to build a UnionFS-like-thing in userspace. This allows you to install lots of mods over the same basic directory structure, and when the game goes looking for an asset, it transparently finds the mod that wants to edit that asset, without having to know anything about the mods themselves. Unfortunately, Windows is case-insensitive, so most mods use a random mixture of capitalization in their directory structures (which need to match up 1:1 with the game's native directory structures, or else asset lookups will fail). If you wanted to recreate this setup on Linux, you'd need to put case folding in the mod manager's UnionFS implementation (which was designed to run on Windows, in userspace, and has no idea that it has to fold case).
(Disclaimer: I have never tried to do this on Linux, so I have no idea how, or if, they actually managed to solve this problem in Wine et al.)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 14:11 UTC (Tue)
by niner (subscriber, #26151)
[Link] (2 responses)
Posted Sep 1, 2020 14:11 UTC (Tue) by niner (subscriber, #26151) [Link] (2 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 21:59 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Sep 1, 2020 21:59 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)
More to the point, however, this is a layering violation of its own, and arguably a much worse one than a case-insensitive ext4 would be. Wine wants to recreate a Windows-compatible environment, not individually hack a million separate apps to work right in an incompatible environment. If the choice is between "reach inside the guts of every single app that presumes a case-insensitive filesystem, and fiddle around with it until it works," and "maintain a relatively straightforward out-of-tree ext4 patch," the latter is probably a lot less work than the former. Bear in mind, of course, that many of those apps are closed-source, but ext4 is not.
So, in this hypothetical where ext4 never grew a case-insensitive mode, you eventually reach the point where they have a stable out-of-tree patch that people are actually using to solve a real problem. Then the logical next question is, who exactly benefits from the patch being out-of-tree?
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 2, 2020 0:09 UTC (Wed)
by floppus (guest, #137245)
[Link]
Posted Sep 2, 2020 0:09 UTC (Wed) by floppus (guest, #137245) [Link]
There are certainly advantages, in performance and consistency, to doing the work in the kernel instead, but it would be much less convenient if Wine *required* ~/.wine/ to be stored on a case-insensitive filesystem.
After all, the purpose of Wine is not just to create a Windows-compatible environment, but to create a Windows-compatible environment inside a Unix-like OS.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 5:46 UTC (Fri)
by ibukanov (subscriber, #3942)
[Link] (57 responses)
Posted Aug 28, 2020 5:46 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (57 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 8:30 UTC (Fri)
by ledow (guest, #11753)
[Link] (10 responses)
Posted Aug 28, 2020 8:30 UTC (Fri) by ledow (guest, #11753) [Link] (10 responses)
Pretty much every compiler has had a patch for this at some point but just wants to push it down to the filesystem, and in that case (a cross-platform compiler dealing with cross-platform code) I'm not at all sure that the local filesystem is the place to let handle it.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 9:29 UTC (Fri)
by Sesse (subscriber, #53779)
[Link] (9 responses)
Posted Aug 28, 2020 9:29 UTC (Fri) by Sesse (subscriber, #53779) [Link] (9 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 12:20 UTC (Fri)
by gutschke (subscriber, #27910)
[Link] (1 responses)
Posted Aug 28, 2020 12:20 UTC (Fri) by gutschke (subscriber, #27910) [Link] (1 responses)
This means, the compiler will only ever need to execute the slow path in a small number of cases. That's fine. 99% of the time, it can just open the file that the user requested. And in the rare exception, it scans the directories and prints a warning message. Subsequently, the results can be cached.
Much saner than making system-wide configuration changes for the benefit of a single defective source file
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 13:29 UTC (Fri)
by jreiser (subscriber, #11027)
[Link]
99% of the time, it [the compiler] can just open the file that the user requested.
Posted Aug 28, 2020 13:29 UTC (Fri) by jreiser (subscriber, #11027) [Link]
The search for a #include file often fails through many directories (much of the -I list, both explicit and implicit) before finding the right one. So the speed penalty of case-insensitivity "everywhere" will be noticeable.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 17:02 UTC (Fri)
by kreijack (guest, #43513)
[Link] (6 responses)
Posted Aug 28, 2020 17:02 UTC (Fri) by kreijack (guest, #43513) [Link] (6 responses)
> match against them one by one and find the one that matches the best.
This is true both for the kernel implementation and for the user space implementation.
> And what if there's both foo and Foo?
It can't happen. To mark a directory "case insensitive", it has to be empty; and after a directory is marked "case insensitive" foo and Foo can't exists at the same time.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 15:04 UTC (Sat)
by Sesse (subscriber, #53779)
[Link] (3 responses)
Posted Aug 29, 2020 15:04 UTC (Sat) by Sesse (subscriber, #53779) [Link] (3 responses)
The kernel implementation can case-fold and then do a better-than-linear lookup after that (e.g. in the dentry cache), the user-space implementation cannot.
> It can't happen. To mark a directory "case insensitive", it has to be empty; and after a directory is marked "case insensitive" foo and Foo can't exists at the same time.
This was in response to the comment that suggested _not_ to use the kernel case insensitivity support, but instead build the logic into the compiler. So the compiler would have to handle this case.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 19:23 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Posted Aug 29, 2020 19:23 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (2 responses)
Strictly speaking, it is possible for userspace to maintain a slightly-out-of-date index of each directory. This still requires a linear scan of each directory, but you can do it asynchronously, and then keep it up to date with inotify events.
However, a compiler should not be in the business of maintaining such an index, and it would be redundant to the kernel's internal data structures in any event. Moreover, I for one do not want to have yet another layer I need to consult whenever the compiler fails to find my shiny new foo.h file. So pushing this off on userspace is definitely a Bad Idea.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 7:04 UTC (Sun)
by gfernandes (subscriber, #119910)
[Link] (1 responses)
Posted Aug 30, 2020 7:04 UTC (Sun) by gfernandes (subscriber, #119910) [Link] (1 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 16:06 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
Posted Aug 30, 2020 16:06 UTC (Sun) by Wol (subscriber, #4433) [Link]
blah blah blah -GNOME
There's no guarantee that gnome will be on the system. And if by user-space you mean samba, then there's a good chance there'll be no screen hence no requirement for gnome.
And if you don't want two files to have the same case-insensitive file name, you have to enforce it at the directory level - any attempt to enforce it higher up risks something bypassing your checks!
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 3:55 UTC (Tue)
by rsidd (subscriber, #2582)
[Link] (1 responses)
Posted Sep 1, 2020 3:55 UTC (Tue) by rsidd (subscriber, #2582) [Link] (1 responses)
What happens when you do "cp bar/* baz/" and baz is marked "case insensitive" and bar contains both foo and Foo? Does the first to be copied get overwritten by the second in baz? Or is there an error? What if foo and Foo are directories? Does everything in the two directories end up in one directory in the destination?
I suppose the answer is "don't do that" and "don't enable case insensitive unless you really really need it..."
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 10:38 UTC (Tue)
by Wol (subscriber, #4433)
[Link]
Posted Sep 1, 2020 10:38 UTC (Tue) by Wol (subscriber, #4433) [Link]
do a "cp -v --no-clobber" if you don't want that to happen. Remember nix does what you tell it, not what you meant to tell it ...
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 9:17 UTC (Fri)
by tchernobog (guest, #73595)
[Link] (38 responses)
Posted Aug 28, 2020 9:17 UTC (Fri) by tchernobog (guest, #73595) [Link] (38 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 11:29 UTC (Fri)
by thumperward (guest, #34368)
[Link] (34 responses)
Posted Aug 28, 2020 11:29 UTC (Fri) by thumperward (guest, #34368) [Link] (34 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 13:58 UTC (Fri)
by warrax (subscriber, #103205)
[Link] (30 responses)
Posted Aug 28, 2020 13:58 UTC (Fri) by warrax (subscriber, #103205) [Link] (30 responses)
(If I were being uncharitable, I would just chalk it up to being averse to any change.)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 14:14 UTC (Fri)
by thumperward (guest, #34368)
[Link]
Posted Aug 28, 2020 14:14 UTC (Fri) by thumperward (guest, #34368) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 19:37 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (28 responses)
Posted Aug 29, 2020 19:37 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (28 responses)
Whenever I hear a software engineer gripe about "complexity," it almost always means "complexity in the layer I'm responsible for." Nobody ever talks about the overall system's complexity, with the result that the system tends towards a maximum of complexity as engineers push responsibilities off on one another. All that complexity-shifting means a lot more data has to flow between the system's various layers, which increases the overall complexity over time.
This is not the fault of the engineers. The process has been going on for so long that almost nobody can hold the entire system in their head at a time (particularly when you start talking about systems larger than one Unix box, such as a distributed system). Everyone only "sees" the complexity nearest them, and they just tacitly assume that the other layers will be fine. So of course this newly-discovered complexity belongs in another layer. My layer takes X and turns it into Y. This complexity is of type Z, which *obviously* needs to be transformed into X before I can handle it. So clearly, the complexity belongs in the layer above me.
To make matters worse, it's often difficult to see the difference between unhelpful complexity-shifting and helpful refactoring. They are, after all, basically the same process. But since (as discussed above) nobody has a global view of the system, it's really hard to see whether moving complexity from A to B is going to reduce or increase overall system complexity. So it ends up being a game of office politics (whether the work is being carried out in an office or on a mailing list), because that is something humans are capable of comprehending. And office politics has the well-known problem of being completely toxic.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 19:14 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (26 responses)
Posted Aug 30, 2020 19:14 UTC (Sun) by Wol (subscriber, #4433) [Link] (26 responses)
Actually, if you ANALYZE the problem, you can usually work out where the complexity belongs. All too often engineers have an itch or a problem, and want an immediate solution. So either the wrong layer claims the problem, or the right layer has no desire to solve it.
This is my problem with RDBMSs. They've defined "data" to make the problem easy for computers. With the result that lists - data - have been pushed into the data management layer when they belong in the data storage layer. And now we have to mix meaningful and meaningless data together so we can recreate lists. If we have a list-capable DBMS (Pick, anyone :-) we can convert a list to a set by throwing away information. But with an RDBMS we can't recreate a list from a set, without storing loads of metadata in the data layer :-(
But actually, that problem is glaringly obvious from the act of normalisation ... analyse the problem and you see it ...
I always say it's fine just solving the bit of the problem that you want/need. But if you don't analyse the *whole* problem-space, fixing part of it now is likely to make anybody stumbling across a different part of it grief in the future.
I agree about office politics, though ...
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 21:19 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (21 responses)
Posted Aug 31, 2020 21:19 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (21 responses)
>
> But actually, that problem is glaringly obvious from the act of normalisation ... analyse the problem and you see it ...
Arguably, this is a matter of opinion, and your belief about whether a list should be a valid data type will depend on how you feel about pragmatism vs. formalism, type theory vs. set theory, whether nontrivial VIEWs are useful, and whether ORMs were a Good Idea. So, basically the same office politics I was just decrying.
My point (which I could have made clearer) was that there is often no "right" answer to these questions, and even if an answer may be "right" in a particular context, it probably fails to generalize to other cases (whose priorities and business requirements may differ). For example, a project might want to be very sure that every piece of data is represented by one and only one entity in the database (a "single source of truth"), so that you cannot have different parts of the database fall out of sync with one another due to faulty application logic. Normalization is specifically designed to solve that problem, and rejecting it makes it harder, or even impossible, to provide that sort of guarantee. Another project might, just as validly, not care as much about guarding against application bugs, because their data is less sensitive to integrity problems, or because they are already taking greater care at the application layer and do not need the DB to double-check their homework (see also CHECK constraints, triggers, stored procedures, etc.). These are both equally valid opinions which may be appropriate to different situations, but it's very hard to have an evidence-based discussion around which of those two scenarios you're actually living in.
In practice, my understanding is that SQL defines an ARRAY type which is at least minimally functional, but it may not be as performant as you might like, depending on what you are trying to do with it. If my understanding is correct, then this has more to do with the standards of your particular codebase (i.e. "are we allowed to use ARRAY?") than with the RDBMS itself. In the worst case scenario, you can always serialize your weird data to BLOB, with the caveat that, obviously, the database doesn't know what's in a BLOB and can't do anything useful with them other than spitting the same bytes back out again when you ask for them.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 23:32 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (20 responses)
Posted Aug 31, 2020 23:32 UTC (Mon) by Wol (subscriber, #4433) [Link] (20 responses)
> Arguably, this is a matter of opinion, and your belief about whether a list should be a valid data type will depend on how you feel about pragmatism vs. formalism, type theory vs. set theory, whether nontrivial VIEWs are useful, and whether ORMs were a Good Idea. So, basically the same office politics I was just decrying.
Well, how do you store "order" in a set? Answer: you can NOT. Yes you can create an "order" field, and stick something in it, but unless you can stick "first", "second" etc then it's not data, it's META data. And as soon as you start MIXing data and metadata you have a massive problem. And as soon as you want to insert or delete an item from the list, you also have a massive problem.
imho, if you can't store a list in a set, then list must be a datatype. And as I said, seeing as you can create a set from a list by throwing away information, but you can't go the other way and create a list from a set, (At least, not without having extraneous information about knowing how to sort the field called "order") that means a list is a superset of a set (and a bag), and can therefore replace both of them.
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 2:15 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (19 responses)
Posted Sep 1, 2020 2:15 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (19 responses)
You can't have it both ways. If the order is "extraneous," that means you don't care about the order, so then you don't store it in the first place. If it's not extraneous, then it's not extraneous, so you store it like any other data. It can't be simultaneously extraneous at the data storage layer but important at the business logic layer, because that's now how data storage works.
I recognize, of course, that changing the order is a more difficult problem, and in many cases, you may need to resort to the ARRAY type in practice. This is particularly likely to be a reasonable choice if the information will not be reused anywhere else in the system, so that you don't lose very much safety by failing to properly normalize it. But to claim that you can't build lists out of sets is absurd; sets are the foundation of mathematics, and you can absolutely build lists out of them (which mathematicians tend to refer to as "tuples" or "n-tuples" for integer n).
And, again, I reiterate that your choice not to use ARRAY is your own self-constructed problem; it's in the SQL standard, which every real RDBMS supports, so use it if you want it.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 10:35 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (18 responses)
Posted Sep 1, 2020 10:35 UTC (Tue) by Wol (subscriber, #4433) [Link] (18 responses)
> You can't have it both ways. If the order is "extraneous," that means you don't care about the order, so then you don't store it in the first place. If it's not extraneous, then it's not extraneous, so you store it like any other data. It can't be simultaneously extraneous at the data storage layer but important at the business logic layer, because that's now how data storage works.
If I have a field called "colour", then the contents of the field have meaning. If I have a field called "order" then the contents of that field are pseudo-random garbage. THAT is the problem.
> And, again, I reiterate that your choice not to use ARRAY is your own self-constructed problem; it's in the SQL standard, which every real RDBMS supports, so use it if you want it.
Please (a) read up on what an RDBMS is. *NO* "real" RDBMS supports arrays - they are forbidden by C&D (yes I know what is marketed as an rdbms supports arrays). And (b) please read what I wrote - it should be pretty obvious I do not (from choice) use rdbms's - I use databases whose natural format is not columns but arrays.
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 16:14 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (11 responses)
Posted Sep 1, 2020 16:14 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)
That's incorrect. The relational algebra doesn't actually care about the data types in rows.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 16:43 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (10 responses)
Posted Sep 1, 2020 16:43 UTC (Tue) by Wol (subscriber, #4433) [Link] (10 responses)
copied from wikipedia ...
> Rule 0: The foundation rule:
> For any system that is advertised as, or claimed to be, a relational data base management system, that system must be able to manage data bases entirely through its relational capabilities.
> Rule 2: The guaranteed access rule:
> Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.
In other words, you can't store an array in a cell and call it an RDBMS. (And note I did say "according to Codd & Date" ...)
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 16:47 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (9 responses)
Posted Sep 1, 2020 16:47 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)
You absolutely can. An array is just treated as an atomic value. Relational algebra actually doesn't care about data types, as long as it's possible to construct a selection operation (basically, a predicate) on top of them.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 18:00 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (8 responses)
Posted Sep 1, 2020 18:00 UTC (Tue) by Wol (subscriber, #4433) [Link] (8 responses)
What do you do if that array is a list of foreign keys? Because Pick is quite happy doing its equivalent of a join on that array, while if the RDBMS treats it as an atomic value, it can't do a join ...
Again, this is another case of inefficiency caused by RDBMS design, because in Pick it doesn't care whether an attribute is a single foreign key or a list of them, while in an RDBMS you have to split a list out into a separate table - or use some sort of hoisting logic. And if you're using hoisting logic you're breaking up a single atomic object into multiple atomic objects ... wtf ...
Just use a DB that is natively list friendly ... :-) Once again, this is bashing the real world into your favourite mathematical model. As Dick Feynmann pointed out, "nature cannot be fooled", and the result is rarely nice. SQL is the Pascal of databases - it forces you to follow its rules. And like Pascal, it's a lot harder to program in than languages that actually try to match the real world. I've said this before - a Pick programmer can hold in his head a database schema that will cover a wall in SQL and have SQL programmers running for cover ... BECAUSE the Pick schema actually tries to be a close approximation to the real world.
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 19:46 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (7 responses)
Posted Sep 1, 2020 19:46 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)
Nothing whatsoever stops regular DBs from doing this. PostgreSQL, MSSQL, Oracle all support arrays and other complex data types.
> Because Pick is quite happy doing its equivalent of a join on that array, while if the RDBMS treats it as an atomic value, it can't do a join ...
Formally, any "join on array" can be rewritten without it (it would just make predicates a bit more complicated). However, in practice all the RDBMS support making joins on custom data.
In short, Pick is nothing special whatsoever. It's an obsolete DB for those, who prefer to live in the past.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 21:35 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (6 responses)
Posted Sep 1, 2020 21:35 UTC (Tue) by Wol (subscriber, #4433) [Link] (6 responses)
And that complexity is IN THE WRONG PLACE. (Which is where this whole discussion started.)
How do you do it while complying with C&D? Because either the list is an atom, in which case it complies but you can't do the join, or you can split the list into its constituent atoms in which case it can't comply because it's not an atom.
It's like I compared the columns "colour" and "order". One contains meaningful values, the other contains pseudo-random garbage. That's added complexity for the programmer - why should he be able to mangle some columns and not others?
Imho the whole problem starts with C&D's statement that says "data comes in rows and columns". In other words, it declares what is acceptable, and anything that doesn't comply needs a data analyst with a sledge hammer to bash round pegs into square holes.
On the other hand, *I* define data as "what the user gives me" and a LOT of that comes as lists. I also define metadata as "anything I can deduce from the data". And Pick makes it *easy* to keep those two different things *separate*. As soon as I have a list, C&D *forces* me to mix and muddle the two. And seriously, how much data/information comes from the real world as a set? Collections of real world objects! Everything *about* an object comes as a list, comes with order, even if said order is random and doesn't really matter. Take the invoice I go on about - the order of line items on an invoice or in a ledger may be random, but that order is an *extremely important* attribute of an invoice!
You may think Pick is obsolete, but why are Pick databases so easy to understand, while relational databases turn your brain to mush? It's because Pick maps pretty closely to the real world. In Pick, the FILE maps to an object definition, the RECORD maps an instance of said object, and the ATTRIBUTE maps to, well, the attribute(s) of said object. Whereas what does a relational table map to? It depends ... What does the row map to? It depends ... What does the column map to? That's easy, an attribute.
And if I wanted to, I could easily map my FILE to your table, my RECORD to your row, and my ATTRIBUTE to your column. Bingo, I've just implemented a relational database in Pick. You can't implement a pick database in C&D, it won't let you! What's the rule? "The generic always trumps the specific". My N-dimensional database trumps your 2-dimensional one! I can look to the world like a 2-dimensional relational database in every respect other than the fact I can outperform it for speed EVERY TIME.
You may be right in that the marketing for relational has trumped everything else and seized the market share mindshare. But you know what? Anything you can do, I can do it with half the resources (or less) because Pick does so much that relational can't. I will admit, though :-) , that anybody who implements a Pick database without using relational maths to design the system is setting themselves up for failure! :-) The maths is good, but implementing it as a 2-dimensional DBMS is just PLAIN STUPID! The world isn't 2-dimensional ..
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 2, 2020 7:42 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
Posted Sep 2, 2020 7:42 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)
You seem to not understand what you're arguing against.
Relational algebra doesn't care about the complexity of individual column types. They can be JSONs, arrays, whatever. The data type makes no difference, as long as it's possible to use it for https://en.wikipedia.org/wiki/Selection_(relational_algebra)
So nobody stops you from writing: "select t.* from sometable t where t.array_field[123] = 456". From a theoretical perspective it's enough to express any condition involving finite arrays. In practice all databases support other extensions. For example in Postgres: "select t.* from unnest(array[1,2,3,2,3,5]) item_id left join items t on t.id=item_id".
Perhaps you should look around at ACTUAL modern databases, not your imaginary version of them?
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 2, 2020 7:49 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (4 responses)
Posted Sep 2, 2020 7:49 UTC (Wed) by Wol (subscriber, #4433) [Link] (4 responses)
Because I'm being a sod and arguing from C&D? C&D forbids it - yes I know modern "relational" databases do it, but by doing it they break the definition of what a relational database is (as per the people who created the relational database).
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 2, 2020 7:50 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Posted Sep 2, 2020 7:50 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)
It does not. I gave you an example that uses an array and is fully compatible with C&D's formulation of relational algebra.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 2, 2020 8:56 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (2 responses)
Posted Sep 2, 2020 8:56 UTC (Wed) by Wol (subscriber, #4433) [Link] (2 responses)
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 6, 2020 2:24 UTC (Sun)
by flussence (guest, #85566)
[Link] (1 responses)
Posted Sep 6, 2020 2:24 UTC (Sun) by flussence (guest, #85566) [Link] (1 responses)
Array indexing was added to SQL-92 28 years ago with SUBSTRING(), which operates on the character array data type formerly defined in SQL-86/FIPS-127.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 6, 2020 10:09 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
Posted Sep 6, 2020 10:09 UTC (Sun) by Wol (subscriber, #4433) [Link]
Surely the fact that it took 22 years to fix what is, imnsho, a serious design flaw, just confirms me in my belief that an RDBMS is a theoretical exercise like Pascal, simplified to make it easy for computers. Unlike Pascal, however, it took off and has seriously hindered data management ever since :-(
To me, it's just second nature to store foreign keys in an array. Let me ask a couple of questions - (1) what percentage of RDBMS programmers today even realise that arrays exist (not the experts, your run-of-the-mill including power users ...). (2) Of them, how many (like me) would use them for foreign keys as a matter of course? and (3) Can a modern RDBMS index the individual atoms in an array? I *hope* the answer is "yes they all can".
(And a fourth - can you put an array in an array? In *most* circumstances yes this is a stupid idea, but sometimes it does make sense ...)
Relational doesn't even eat its own dog food - I'm pretty sure most RDBMSs enforce the rule that every row has a primary key internally - even if it is just an index into a list :-) and I know that on at least one occasion a table I've been dealing with has ended up with a bag in it. I really don't know how we fixed that because all the internal tools assumed that they would be dealing with a single row, not a two-row set, and crashed accordingly.
Or take a table's list of columns - there, I said it, LIST. The order may be unimportant mathematically, but it's extremely important for human comprehension. The RDBMS must have some hidden mechanism to ensure order preservation (and, per a previous post of mine, if it's a hidden field for sort-order, that just goes back to my point about mixing data and meta-data in the same table, a BAD BAD BAD idea!).
Anyway, I'm probably coming over as a fanatic (which I am :-), but I seriously think Relational is badly broken as a design document for a DBMS. (It is, however, a brilliant too for data analysis - I wouldn't be without it for that!)
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 22:04 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (5 responses)
Posted Sep 1, 2020 22:04 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (5 responses)
By "real RDBMS," I refer to software that is actually used in the real world, not the irrelevant opinion of some random bit of academia. Can you identify any "real RDBMS" by this definition that lacks support for ARRAY?
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 23:30 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (4 responses)
Posted Sep 1, 2020 23:30 UTC (Tue) by Wol (subscriber, #4433) [Link] (4 responses)
You mean the people who actually defined what a relational database was in the very beginning?
So your definition of a "real RDBMS" is actually a bodge-up to get round a balls-up in the original design.
I'd rather use a DBMS that actually has a solid, coherent, logical design to it, thank you very much.
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 2, 2020 0:19 UTC (Wed)
by rahulsundaram (subscriber, #21946)
[Link] (2 responses)
Posted Sep 2, 2020 0:19 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link] (2 responses)
You have to be aware at this point, that this quixotic insistence that anything that doesn't strictly fit into an original academic definition is not a "real RDBMS" and only some obscure pet database qualifies isn't going to something you are going to find consensus around. The commonly accepted definition of what is a RDBMS today would definitely include apparently don't want to include but I doubt you are changing any minds here. Tough luck on that one.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 2, 2020 1:38 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Sep 2, 2020 1:38 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (1 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 2, 2020 7:53 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
Posted Sep 2, 2020 7:53 UTC (Wed) by Wol (subscriber, #4433) [Link]
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 2, 2020 1:31 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link]
Posted Sep 2, 2020 1:31 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
It would seem your answer is "no," then. So your complaints about RDBMS's have nothing to do with real-world software and are purely conceptual. I therefore see no point in continuing this discussion, because it has nothing to do with what I was originally talking about (complexity *in real software*).
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 21:55 UTC (Mon)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Aug 31, 2020 21:55 UTC (Mon) by nix (subscriber, #2304) [Link] (3 responses)
Of course this problem is not intrinsic to the relational calculus, which of course doesn't even have a concept of 'tables', only relations, and does not define anything at all about data storage versus data management. It is perfectly possible for an RDBMS to spot the frequent use of relations with incrementing values as a key and represent it in storage as a list of some kind. It's just that (almost?) none do any such optimization.
(But then, most modern RDBMSs have almost no relationship to the actual relational calculus, which is really quite elegant. SQL and table-based databases... aren't.)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 1, 2020 0:06 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (2 responses)
Posted Sep 1, 2020 0:06 UTC (Tue) by Wol (subscriber, #4433) [Link] (2 responses)
Yup. Most modern databases aren't RDBMSs. They claim to be but break all the rules.
> It is perfectly possible for an RDBMS to spot the frequent use of relations with incrementing values as a key and represent it in storage as a list of some kind. It's just that (almost?) none do any such optimization.
I won't say Pick does this by design, it does it more by accident, but it does exactly that! Let's take my invoice example - line items are they lines on an invoice or lines in a ledger ... ?
I'd probably store them as objects in their own right, lines in a ledger, with an array in the invoice pointing to them. But I could store them as sub-rows in an invoice, and simply define the ledger as all these subrows in the INVOICE file. Either way, simply accessing the invoice record will optimise access to *all* the associated ledger lines.
Oh - and if I understand you right, those incrementing values - are you incrementing them across the ledger, or incrementing them as part of a compound key in the invoice. Either way involves pain (major pain if you're doing it on a ledger basis) if you want to insert a line, and while it's not so painful it could give you grief deleting lines too. And those values - are they numeric, alphabetic, whatever. The VALUE is irrelevant, it's not data, it's the SORT ORDER that matters, which means we are muddling data and metadata, and pushing stuff into the data management layer that doesn't belong there.
Relational explicitly allows for future optimisation, but also actively hinders said optimisation by saying you MUST use two-dimensional calculus. I'd argue that that itself is massively inefficient. And every time I try to analyse optimisation in Pick, I can't see any way of improving it. That invoice example again - if I access a line via the invoice record, the mere act of accessing it optimises access to all other lines on that invoice. And the probability of me wanting to access one of those lines is much higher than any other line in the ledger. It's much harder to optimise access to any other random ledger line based on selecting one ledger line unless you index the field you've selected on.
I guess, in a way, Pick justs indexes all interesting foreign keys by default. Do a decent object/relational analysis and this all just falls out naturally. I actually see the difference between Pick and an RDBMS as relational stores one-dimensional rows in a three-dimensional world. Pick takes real world objects and stores them, and if it's done properly they have a relational analysis done on them and each "atom" in the database is a relational view of a real world object. Hence it's much easier to comprehend, and it's also much more efficient because an object in the world is stored as an atom in the database.
And given that pretty much all "rdbms"s now include lists (arrays) which are most definitely not compatible with a true RDBMS, why not use a list-based database right from the start? You'll get better results than trying to bash square pegs into round holes.
Relational PRESCRIBES data as coming in rows and columns. It can't handle any other sort of data and pretends it doesn't exist. Like geometry used to prescribe parallel lines as never meeting. We threw that out and stepped outside Euclidean geometry. Let's throw out this stupid 2-dimensionality of relational, and step into the n-dimensional world of list-based databases like Pick.
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 13, 2020 16:24 UTC (Sun)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Sep 13, 2020 16:24 UTC (Sun) by nix (subscriber, #2304) [Link] (1 responses)
This means it cannot be used for actual real-world databases of any scale. The not terribly large financial database systems I used to work on had considerable thought put into table design so as to minimize the number of unnecessary indexes, because when you're talking terabyte-scale tables, indexes are both expensive to compute and take ages to build -- but queries that do not exploit them will effectively never terminate. Indexing all interesting foreign keys automatically (without human input into what defines "interesting") would be an enormous waste of disk space and sacrifice performance to no end. (And we were talking spinning rust here: losing hours to weeks on one indexing operation was not unknown.)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 13, 2020 19:24 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
Posted Sep 13, 2020 19:24 UTC (Sun) by Wol (subscriber, #4433) [Link]
I presume by that, you mean size? Bear in mind, for the same amount of user-supplied data, a Pick database probably occupies half the disk space.
And that example of the astronomical database where Oracle had to disable indexing to meet the target, while Cache (not Pick, but similar) sailed right past the target plus 150% ...
I said Pick "in a way, just indexes all interesting foreign keys by default". If I want to know the keys of all the ledger lines in my invoice, I just read that invoice from the client ledger and I've got a list of all the lines - the data record IS the index ... think of a hierarchical database ...
That's fast because once you've got your top-level record you just drill down the links. That's exactly what Pick does, except that with Pick any record can be the top level. If I want a list of all foreign keys associated with an object, I just read the record for that object. Forget about minimising index accesses, Pick minimizes spinning rust accesses. Given an invoice number, how many table and index references do you need to get all the information about the invoice? I'll assume there are ten line items, so you presumably need to select the invoice table - an index access to find the record followed by a table access to get the item itself. Now select the ledger index to find all the line item keys, hopefully it's optimised and gives you the internal key rather than the primary key so that you don't need yet another index lookup to find out where the line item is. How many spinning rust accesses is that? AT LEAST thirteen, may be more. Pick it's eleven, one for the invoice, one for each item. That's for eleven different objects. Is it *possible* to improve on that, even theoretically? And your extra two accesses, that's reading the table index. Depending on the size of your table, that could be a LOT of spinning rust that I don't even go near ... (Pick uses dynamically hashed files, so enforces primary keys ...)
That's probably behind another favourite of mine where some experts spent six months trying to get Oracle on a twin Xeon 800 to run faster than Pick on a Pentium 90 ...
I guess Pick can be used for far bigger databases than relational because, for any given hardware, the Pick database will store/process twice as much user data ... :-)
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 4, 2020 6:11 UTC (Fri)
by mgedmin (subscriber, #34497)
[Link]
Posted Sep 4, 2020 6:11 UTC (Fri) by mgedmin (subscriber, #34497) [Link]
Quote of the Week material.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 16:38 UTC (Fri)
by k8to (guest, #15413)
[Link] (1 responses)
Posted Aug 28, 2020 16:38 UTC (Fri) by k8to (guest, #15413) [Link] (1 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 16:04 UTC (Sat)
by marcH (subscriber, #57642)
[Link]
Posted Aug 29, 2020 16:04 UTC (Sat) by marcH (subscriber, #57642) [Link]
Of course higher level user interfaces should be case-insensitive, most are already. This just belongs to neither the filesystem nor the command line.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 19:57 UTC (Fri)
by vadim (subscriber, #35271)
[Link]
Posted Aug 28, 2020 19:57 UTC (Fri) by vadim (subscriber, #35271) [Link]
I've encountered a fair amount of people who don't seem to have the faintest clue why anybody felt the need to use systemd for instance -- for them SysV scripts and inittab is all that's needed.
I think this can happen quite easily -- all you need to do is to either do the same thing at the same company for decades, settle on a very narrow specialization where you have little clue of what other people are doing, or refuse to do anything new and keep finding jobs where things are done the old fashioned way.
And really, if you're a seriously hardcore Linux guy who hasn't touched anything but Linux in the last decade or two, this whole concern might as well be alien to you. This is more of an issue for people dealing with issues of portability, and not everybody does.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 11:31 UTC (Fri)
by khim (subscriber, #9252)
[Link]
Posted Aug 28, 2020 11:31 UTC (Fri) by khim (subscriber, #9252) [Link]
Rename to all lower-case is not an option.
This would only make sense in an imaginary world where Linux (in conjuntion with other case-insensive FS OSes) took 90% of desktop.
Linux community tried to achieve that for quater-century - and got nowhere.
That means that at this point choice is between literally millions of lines of code in various libraries and programs - or much smaller number of lines in kernel.
And yes, cost/benefit ratio clearly shows that having one implementation in kernel is more maintanable medium-term.
What would happen in year 2500, when Linux would, finally, achieve desktop dominance - is question not for us but for our descendants.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 16:00 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (1 responses)
Posted Aug 28, 2020 16:00 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)
Well, as I understand it, it's an OPTIONAL addition to ext4. So it will only exist as long as ext4. And presumably it can be left out if you don't want it.
Yes it's a pain dealing with it. But dealing with humans is a pain :-)
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 18:46 UTC (Fri)
by tchernobog (guest, #73595)
[Link]
Posted Aug 28, 2020 18:46 UTC (Fri) by tchernobog (guest, #73595) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 13:57 UTC (Sun)
by remi.chateauneu (subscriber, #51826)
[Link] (1 responses)
Posted Aug 30, 2020 13:57 UTC (Sun) by remi.chateauneu (subscriber, #51826) [Link] (1 responses)
But it implies that this C++ project may not build if the sources are moved to another file-system. And of course it might not build on BSD too. And not run depending on the file system, if it opens files like "abc.tmp", "Abc.Tmp" etc... This, just to avoid properly capitalizing header filenames.
To generalize to a language with accents, the example "'important report.ods' and 'IMPORTANT REPORT.ods'", these file names would be "mean the same data":
"œuvrer à un système général.bêta"
"oeuvrer a un systeme general.beta"
"ŒUVRER À UN SYSTÈME GÉNÉRAL,BÊTA"
"œuvrer à un système général.ß"
"oeuvrer-a-un-systeme-general.beta"
... plus combinations of words delimiters like spaces, quotes, tabs, non-breaking spaces, underscores or hyphens (possibly duplicate or missing) etc... because sentences without these, "mean the same piece of data".
And these file extensions would point to the same applications, or not ?
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 3, 2020 1:43 UTC (Thu)
by draco (subscriber, #1792)
[Link]
Posted Sep 3, 2020 1:43 UTC (Thu) by draco (subscriber, #1792) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 6, 2020 18:47 UTC (Sun)
by scientes (guest, #83068)
[Link]
Posted Sep 6, 2020 18:47 UTC (Sun) by scientes (guest, #83068) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 11, 2020 19:59 UTC (Fri)
by bartoc (guest, #124262)
[Link] (3 responses)
Posted Sep 11, 2020 19:59 UTC (Fri) by bartoc (guest, #124262) [Link] (3 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 13, 2020 16:28 UTC (Sun)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Sep 13, 2020 16:28 UTC (Sun) by nix (subscriber, #2304) [Link] (2 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 13, 2020 19:59 UTC (Sun)
by felix.s (guest, #104710)
[Link] (1 responses)
Posted Sep 13, 2020 19:59 UTC (Sun) by felix.s (guest, #104710) [Link] (1 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 14, 2020 17:27 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted Sep 14, 2020 17:27 UTC (Mon) by nix (subscriber, #2304) [Link]
Normalization vs. Case-sensitivity
Posted Aug 28, 2020 11:38 UTC (Fri)
by V02460 (subscriber, #123493)
[Link]
Posted Aug 28, 2020 11:38 UTC (Fri) by V02460 (subscriber, #123493) [Link]
What doesn't make sense to me is that the article conflates Unicode normalization and case-insensitivity.
I as a user can't keep the different versions of café apart, so normalization helps me there. For letter-casing instead I don't have a problem keeping e.g. B and b apart from each other.
Citing semantics is a little bit misleading, I think. We wouldn't want filenames with different synonyms to be mapped to the same data as that would be quite arbitrary and a little restrictive. Introducing special rules on usable characters for latin scripts feels quite arbitrary and a little restrictive to me as well. Adding this change to make a system more compatible, on the other hand, and making it optional as well, sounds like a good idea to me, though.
About search: Why can't search be case-insensitive, even if the files are stored with case?
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 12:13 UTC (Fri)
by nettings (subscriber, #429)
[Link] (8 responses)
Posted Aug 28, 2020 12:13 UTC (Fri) by nettings (subscriber, #429) [Link] (8 responses)
For the sake of everyone else, I really hope this thing dies a horrible death on LKML...
Next thing someone comes up with is localizing folder names. Anyone?
$~ echo $LANG
de_DE
$~ ls -al /
/bfe
/benutzer
/bntzr
/bib
/einhngn
/grt
/prg
/proz
/sprg
/stiefel
/vä
/vrbg
*wakes up sweaty and disoriented*
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 12:25 UTC (Fri)
by rahulsundaram (subscriber, #21946)
[Link] (2 responses)
Posted Aug 28, 2020 12:25 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link] (2 responses)
From the very first sentence of the linked blog
"Linux 5.2 was released over one year ago and with it, a new feature was added to support optimized case-insensitive file name lookups in the Ext4 filesystem - the first of native Linux filesystems to do it."
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 14:19 UTC (Fri)
by thumperward (guest, #34368)
[Link] (1 responses)
Posted Aug 28, 2020 14:19 UTC (Fri) by thumperward (guest, #34368) [Link] (1 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 6:25 UTC (Sat)
by zdzichu (subscriber, #17118)
[Link]
Posted Aug 29, 2020 6:25 UTC (Sat) by zdzichu (subscriber, #17118) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 12:59 UTC (Fri)
by cesarb (subscriber, #6266)
[Link] (4 responses)
Posted Aug 28, 2020 12:59 UTC (Fri) by cesarb (subscriber, #6266) [Link] (4 responses)
Doesn't XDG already do this? The folders automatically created on my home directory are named "Área de trabalho", "Documentos", "Downloads", "Imagens", "Modelos", "Música", "Público", "Vídeos". Had my locale been anything other than pt-BR, these folders would have other names.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 21:12 UTC (Fri)
by NAR (subscriber, #1313)
[Link] (3 responses)
I think Windows does it differently. If I list the directory names of a Windows partition from Linux, I see e.g. Posted Aug 28, 2020 21:12 UTC (Fri) by NAR (subscriber, #1313) [Link] (3 responses)
/users
. If I list the same directory in Windows Explorer (or probably from the system file open dialog) I see Felhasználók
(at least in Hungarian Windows). So I think the translation happens in the UI - the dialogs, etc. do not show the physical filename (that's stored on the disk), but translate it. I'm not even sure if it's consistent, some applications have their own dialogs and those do not translate the names (I don't have Windows in front of me right now, so can't check it, but I think e.g. GIMP does not translate the filenames). As far as I noticed, XDG does not do it, I get to see the same filenames regardless of the current locale.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 21:40 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (1 responses)
Posted Aug 28, 2020 21:40 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)
Even directly in Windows, if I went in and looked at it using Windows Explorer, I would see c:\Users\Wol\Documents. But if I went in as me, I would see "My Documents".
What I hope linux does, and I suspect it is the case, is that it canonicalises the name and uses that as the directory name, but whatever name the user gave it it saves in a "display name" field so that is what the user sees. So while the user might type "Foo", "fOO", "foo", whatever, the actual directory entry will always be "foo" or "FOO" depending on which case it chooses to use. So long as the display name is used then the user will see whatever they typed, giving you a "case insensitive but case preserving" system.
Windows has pulled that sort of stunt ever since W95 ...
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 12:29 UTC (Mon)
by milesrout (subscriber, #126894)
[Link]
Posted Aug 31, 2020 12:29 UTC (Mon) by milesrout (subscriber, #126894) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 23:42 UTC (Sat)
by cesarb (subscriber, #6266)
[Link]
Posted Aug 29, 2020 23:42 UTC (Sat) by cesarb (subscriber, #6266) [Link]
But yeah, Windows Explorer does things differently. What you see in Windows Explorer is a virtual hierarchy defined as COM objects (the Shell Namespace), not the real filesystem hierarchy, so you can have for instance virtual folders (like the Control Panel) which are visible in that virtual hierarchy but are not in the filesystem. The inconsistency you see is probably between applications using the Shell Namespace and applications using the filesystem directly.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 14:24 UTC (Fri)
by dskoll (subscriber, #1630)
[Link]
Posted Aug 28, 2020 14:24 UTC (Fri) by dskoll (subscriber, #1630) [Link]
While I understand the need for this feature, I hate the very idea of it with every fiber of my being. Luckily, I don't have to deal with case-insensitivity since I have no day-to-day interaction with anything that needs it. So I just don't enable the feature.
As long as having the feature doesn't impose any penalty for those who choose not to enable it, I think (reluctantly) that this is the appropriate place to put it.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 16:00 UTC (Fri)
by magfr (subscriber, #16052)
[Link] (7 responses)
Posted Aug 28, 2020 16:00 UTC (Fri) by magfr (subscriber, #16052) [Link] (7 responses)
https://en.m.wikipedia.org/wiki/Dotted_and_dotless_I
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 18:09 UTC (Fri)
by Jonno (subscriber, #49613)
[Link] (6 responses)
Posted Aug 28, 2020 18:09 UTC (Fri) by Jonno (subscriber, #49613) [Link] (6 responses)
> R4: toCasefold( X ): Map each character C in X to Case_Folding(C).
> Case_Folding(C) uses the mappings with the status field value “C” or “F”
> in the data file CaseFolding.txt in the Unicode Character Database.
>
> [...]
>
> D145: A string X is a canonical caseless match for a string Y if and only if:
> NFD(toCasefold(NFD( X ))) = NFD(toCasefold(NFD( Y )))
The Unicode also provides guidance for the implementation of "tailored casing operations", including suggested rules for locale dependent casing operations for Lithuanian, Turkish and Azeri, which is what you are talking about. (Note that the Lithuanian rules does not affect toCasefold, only toUppercase, toLowercase and toTitlecase; and that the rules for Turkish and Azeri are identical).
Optional support for using Turkic case folding instead of default case folding would be great, and would fit right in as another flag argument. I'm sure patches would be welcome...
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 18:56 UTC (Fri)
by nave (subscriber, #105585)
[Link] (4 responses)
Posted Aug 28, 2020 18:56 UTC (Fri) by nave (subscriber, #105585) [Link] (4 responses)
Files are copied between computers.
Their filenames may have been created with different locale settings.
Support for using Turkic case folding (or any other option) is not enough.
The locale settings must follow the file.
We would need something like the RDF langString: "name@lang", where 'lang' is an IETF BCP 47 language tag.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 0:04 UTC (Sat)
by Jonno (subscriber, #49613)
[Link] (3 responses)
Posted Aug 29, 2020 0:04 UTC (Sat) by Jonno (subscriber, #49613) [Link] (3 responses)
That is impossible, as the locale is not a property of the file name, but of the comparison operation. At best you could set the locale on a per-directory basis, but I hardly see how that would be any better than per filesystem.
(For reference, NTFS uses Turkic case folding for file systems formatted on a Turkish language Windows install, and non-Turkic case folding for file systems formatted on any other language Windows install; and Turkish Windows users seems to deal with it just fine.)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 18:30 UTC (Sat)
by k8to (guest, #15413)
[Link] (1 responses)
Posted Aug 29, 2020 18:30 UTC (Sat) by k8to (guest, #15413) [Link] (1 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 22:37 UTC (Sat)
by nave (subscriber, #105585)
[Link]
Posted Aug 29, 2020 22:37 UTC (Sat) by nave (subscriber, #105585) [Link]
Exactly!
Files are (will be eventually) shared between processes, users, computers, countries.
*Any* *scope* we can choose for locale settings (process, cgroup, user, directory, file system, computer, OS localization, LAN, organization, country) is simultaneously:
- too big to handle all possible filenames correctly (when doing case folding or other human language operation);
- too small to be sure that files will always be moved inside it (never shared outside that scope).
Having filenames with an IETF BCP 47 language tag attached (based on the locale, for example)
may help with the human language operations when a file is shared/copied/moved.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 10:00 UTC (Sun)
by nave (subscriber, #105585)
[Link]
Posted Aug 30, 2020 10:00 UTC (Sun) by nave (subscriber, #105585) [Link]
The design is OK *if* described as "support case-insensitive lookups for Windows compatibility".
This is important and very useful work, and I'm grateful for it.
The following justification has the heart in the right place but I think it's giving us *false hope*:
"[...] that is not how humans operate. When people write titles, 'important report.ods' and 'IMPORTANT REPORT.ods' usually mean the same piece of data, and you don't care how it was written when creating it.
We care about the content and the semantics of the words IMPORTANT and REPORT"
We cannot achieve this goal so easily. Many commenters explained why.
To me it seems that the core of the issue is **closed vs. open world** bias.
Locales support human language- / culture-specific processing of *curated collections* = closed worlds.
Example: sort order or case-folding in a dictionary, book index, document archive.
I agree that a single directory, and maybe a whole tree (a file system instance), can be managed like a curated collection, a closed world.
But can we expect *careful curation* from users who do *not* care about case? *These* users were mentioned to justify the need. I don't think it will work.
An *open world* is the more general case that I care about, and the only realistic expectation:
file systems should store and find later *all* files that we acquire, which may come from any other computer, locale, OS localization, organization, country.
> For reference, NTFS uses Turkic case folding for file systems formatted on a Turkish language Windows install,
> and non-Turkic case folding for file systems formatted on any other language Windows install;
> Turkish Windows users seems to deal with it just fine.
Let's assume Turkish Windows users exchanging files are satisfied with the handling of dotted and dotless I in filenames.
Does everything work as expected when their files, or whole trees, go through other computers, or just flash drives formatted by non-Turkish users?
I'd expect that they do what most of us do: for successful exchange avoid fancy names, the US-ASCII subset is safest, etc.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 4, 2020 6:27 UTC (Fri)
by mgedmin (subscriber, #34497)
[Link]
Posted Sep 4, 2020 6:27 UTC (Fri) by mgedmin (subscriber, #34497) [Link]
*googles, looks it up in SpecialCases.txt*
Ah, it's about accented text which is basically only used in dictionaries and textbooks to indicate which syllable is stressed. Lowercase i retains its dot when an additional accent indicating stress is placed on it, which requires an extra Unicode combining character that needs to be explicitly dropped when uppercasing.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 16:03 UTC (Fri)
by Chousuke (subscriber, #54562)
[Link] (12 responses)
Posted Aug 28, 2020 16:03 UTC (Fri) by Chousuke (subscriber, #54562) [Link] (12 responses)
What happens with case-insensitive ext4 if you copy over files with the same name but different case from another filesystem? Do you just destroy data silently, or does it actually complain?
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 16:13 UTC (Fri)
by Chousuke (subscriber, #54562)
[Link] (1 responses)
Posted Aug 28, 2020 16:13 UTC (Fri) by Chousuke (subscriber, #54562) [Link] (1 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 16:54 UTC (Fri)
by dvdeug (guest, #10998)
[Link]
Posted Aug 28, 2020 16:54 UTC (Fri) by dvdeug (guest, #10998) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 18:21 UTC (Sat)
by marcH (subscriber, #57642)
[Link] (8 responses)
Posted Aug 29, 2020 18:21 UTC (Sat) by marcH (subscriber, #57642) [Link] (8 responses)
For even more fun, imagine the source and/or destination have per-directory sensitivity.
Sheer insanity.
PS: also realized this can probably be tested with Windows today. Too bad life is too short, already wasted enough time with Windows and issues like these.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 11:18 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (4 responses)
Posted Aug 31, 2020 11:18 UTC (Mon) by Wol (subscriber, #4433) [Link] (4 responses)
The user supplied name is used when passing it somewhere else that is not (immediately) accessing the file.
So a copy uses the user-supplied name in user-space, the file systems at either end canonicalise that name to ensure uniqueness. The only way we can then get grief of inaccessible files (yes Apple had that problem) is if we change the canonicalisation rule on an active file system. BAD IDEA!
Analyse the problem then the solution is obvious - use user names in user space, and system names in system space!
Cheers,
Wol
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 17:02 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (3 responses)
Posted Aug 31, 2020 17:02 UTC (Mon) by marcH (subscriber, #57642) [Link] (3 responses)
Not my NTFS experience, so while researching it I found that case-insensitivity is NOT implemented at the NTFS filesystem level?!?
https://www.betaarchive.com/wiki/index.php/Microsoft_KB_A...
Unless it is now?!? http://drewthaler.blogspot.com/2007/12/case-against-insen...
> NTFS: Case-insensitive in different ways depending on the version of Windows that created the volume.
What an total mess...
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 17:08 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Aug 31, 2020 17:08 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)
It is implemented in the FS. You have to do case folding in the FS driver. Moreover, NT used to actually store the case conversion table in a special hidden file on NTFS, so you can implement it by doing a simple lookup in a 16-bit table.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 17:20 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (1 responses)
Posted Aug 31, 2020 17:20 UTC (Mon) by marcH (subscriber, #57642) [Link] (1 responses)
Since you seem knowledgeable about this, would you know why Microsoft apparently tried to delete this KB from the Internet? And also comment on the "depending on the Windows version" quote?
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 17:27 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Aug 31, 2020 17:27 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]
It used to be a per-FS flag, actually.
> Since you seem knowledgeable about this, would you know why Microsoft apparently tried to delete this KB from the Internet?
MS doesn't really delete KB articles, they just constantly change the way they're organized. And they recently started expiring the old articles. They are still available through KB archive if you need them, though.
The archive states that the article applied to:
> Microsoft Windows NT Advanced Server 3.1
> Microsoft Windows NT Workstation 3.1
Which are long dead and gone.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 4, 2020 6:30 UTC (Fri)
by mgedmin (subscriber, #34497)
[Link] (2 responses)
Posted Sep 4, 2020 6:30 UTC (Fri) by mgedmin (subscriber, #34497) [Link] (2 responses)
Have you ever mounted a VFAT-formatted USB drive on a Linux system and copied files between it and elsewhere? This is not a new, never-heard-before, oh the calamity! situation.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 4, 2020 8:55 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (1 responses)
Posted Sep 4, 2020 8:55 UTC (Fri) by marcH (subscriber, #57642) [Link] (1 responses)
Yes and that's exactly why the idea of the same quirks but on a much larger scale is scary.
On Linux people script cp -R and rsync without even thinking about it. robocopy always sounds like an adventure.
Anyway the comment you're answering was about per-directory case sensitivity, I miss how this VFAT example is related.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 4, 2020 16:35 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Posted Sep 4, 2020 16:35 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 2, 2020 1:20 UTC (Wed)
by riking (subscriber, #95706)
[Link]
Posted Sep 2, 2020 1:20 UTC (Wed) by riking (subscriber, #95706) [Link]
Your program can check for +F on the folder if you need to deal with this.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 17:08 UTC (Fri)
by jch (guest, #51929)
[Link]
Posted Aug 28, 2020 17:08 UTC (Fri) by jch (guest, #51929) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 19:15 UTC (Fri)
by cpitrat (subscriber, #116459)
[Link]
Posted Aug 28, 2020 19:15 UTC (Fri) by cpitrat (subscriber, #116459) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 28, 2020 22:31 UTC (Fri)
by Matlib (guest, #134276)
[Link] (9 responses)
Posted Aug 28, 2020 22:31 UTC (Fri) by Matlib (guest, #134276) [Link] (9 responses)
I've made a number of Debian and Ubuntu installations in the past to all sorts of people, including those who didn't really feel any difference whether caps lock was on or off. I don't recall anyone complaining about case-sensitive names.
The save dialog could ask for confirmation if the name is too similar to an existing one. Even better, the drop-down list may show similarly named files when typing. This falls more into UX enhancement category.
Anyway, what did they complain about then?
- #0 – that there was no confirmation on delete
- (linked problem) – that trash folders were created on memory sticks
- incompatibilities between OO/LO and MS Office
SpaceFM fortunately solved the first two though.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 0:47 UTC (Sat)
by rgmoore (✭ supporter ✭, #75)
[Link] (8 responses)
Posted Aug 29, 2020 0:47 UTC (Sat) by rgmoore (✭ supporter ✭, #75) [Link] (8 responses)
The big problems with Linux being case sensitive come when it's interacting with other operating systems. For example, at my work we run our scientific instruments on Windows because that's what the instrument control software requires, but we archive our data to a Linux box using rsync. The archive box then shares the data using Samba so we can access years worth of older data on our Windows machines.
We recently ran into a big hassle when we updated one of our machines to Windows 10 and I accidentally named the data directory "Data" instead of "data". The Linux archive box treated this as a different directory and added the new data to the new directory. When the Samba server served it, it showed there being two directories, "Data" and "data", but Windows showed their contents as being the same, so we couldn't access our older data. We were eventually able to sort things out be renaming the "data" directory to "old_data", but it was an unnecessary difficulty. Having a case-insensitive filesystem would have avoided the whole problem. Sure, you can blame Windows for the problem rather than Linux, but if you want to use Linux boxes to serve files for Windows computers, they need to be able to do things in a Windows-friendly fashion.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 2:35 UTC (Sat)
by gb (subscriber, #58328)
[Link] (6 responses)
Posted Aug 29, 2020 2:35 UTC (Sat) by gb (subscriber, #58328) [Link] (6 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 7:06 UTC (Sat)
by zorro (subscriber, #45643)
[Link] (4 responses)
They already have. See https://petri.com/turn-windows-10-ntfs-case-sensitivity
Posted Aug 29, 2020 7:06 UTC (Sat) by zorro (subscriber, #45643) [Link] (4 responses)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 2:08 UTC (Sun)
by zlynx (guest, #2285)
[Link] (2 responses)
Posted Aug 30, 2020 2:08 UTC (Sun) by zlynx (guest, #2285) [Link] (2 responses)
I tried stripping a virtual machine Windows boot drive of all its 8.3 compatibility names once. I was shocked at how many "modern" Windows programs use C:\PROGRA~1 as some kind of shortcut to Program Files. Those are probably the same programs that would miserably fail if the boot drive was F not C.
I believe this kind of thing is why REFS is limited to data drives only.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 18:21 UTC (Sun)
by khim (subscriber, #9252)
[Link] (1 responses)
Posted Aug 30, 2020 18:21 UTC (Sun) by khim (subscriber, #9252) [Link] (1 responses)
I actually had Windows 98 (when that was a thing) installed on drive D. Surprisngly few programs failed. But amount of pain I needed to make installers work... it's just not worth it.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 16:00 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
Posted Aug 31, 2020 16:00 UTC (Mon) by mathstuf (subscriber, #69389) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 13:56 UTC (Mon)
by eru (subscriber, #2753)
[Link]
<They already have. See https://petri.com/turn-windows-10-ntfs-case-sensitivity
Posted Aug 31, 2020 13:56 UTC (Mon) by eru (subscriber, #2753) [Link]
I suspect even with that setting, con, aux, prn etc are still reserved names..
Just for fun, if you have access to SharePoint or OneDrive, try uploading a file named aux.txt from Linux, using the web browser interface. You will get a complaint that your file name contains invalid characters!
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 18:18 UTC (Sun)
by khim (subscriber, #9252)
[Link]
Posted Aug 30, 2020 18:18 UTC (Sun) by khim (subscriber, #9252) [Link]
Because most app developers use Windows.
> Can't Windows be Linux friendly?
It can but not by default. And even then - it wouldn't magically fix programs.
> What about making windows case-sensitive?
Out of the question because this would break these same programs.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 18:34 UTC (Sat)
by k8to (guest, #15413)
[Link]
Posted Aug 29, 2020 18:34 UTC (Sat) by k8to (guest, #15413) [Link]
I was sceptical above, but not now
Posted Aug 29, 2020 2:24 UTC (Sat)
by gus3 (guest, #61103)
[Link]
Posted Aug 29, 2020 2:24 UTC (Sat) by gus3 (guest, #61103) [Link]
TL;DR: I get it. I'm on board.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 6:47 UTC (Sat)
by kunitz (subscriber, #3965)
[Link] (8 responses)
There are multiple reasons why the addition of the feature doesn't make a lot of sense.Posted Aug 29, 2020 6:47 UTC (Sat) by kunitz (subscriber, #3965) [Link] (8 responses)
- Unicode capitalization rules are changing with the consequence that the exact behavior of the feature will depend on the Unicode version supported by the kernel.
- User-space software can still not rely on the feature being available, so software would need to check whether the feature is supported and implement a fallback if it is not.
- It will not be used widely and therefore not sufficiently tested; so the feature will break silently at one point in the future.
This is a typical might-be-useful feature that adds complexity and needs to be maintained forever.
The Floß/FLOSS example made me laugh because since 2017 the official rules of the German language allow the use of an capital ß additionally to the replacement by SS. Typographers are discussing the capital letter ß for over a century. More about it in the Wikipedia entry.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 11:47 UTC (Sat)
by james (subscriber, #1325)
[Link] (4 responses)
I don't know about "won't be used widely."Posted Aug 29, 2020 11:47 UTC (Sat) by james (subscriber, #1325) [Link] (4 responses)
If Samba were to support it, I can imagine that Synology and other manufacturers of Linux-based NAS devices would want to use it, if only because it might help performance in reviews.
And if they use it, they have commercial reasons to test it.
(A quick search doesn't turn up any Samba patches that take advantage of this: just complaints that checking that none of the files in a very large directory matches a particular filename is very expensive.)
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 13:33 UTC (Sat)
by barryascott (subscriber, #80640)
[Link] (2 responses)
Posted Aug 29, 2020 13:33 UTC (Sat) by barryascott (subscriber, #80640) [Link] (2 responses)
And Fedora is moving to btrfs.
In both cases unless this feature is added to those FS it less interesting?
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 18:46 UTC (Sat)
by zdzichu (subscriber, #17118)
[Link]
Posted Aug 29, 2020 18:46 UTC (Sat) by zdzichu (subscriber, #17118) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 12:32 UTC (Sun)
by james (subscriber, #1325)
[Link]
Commercial NAS devices come with pre-installed firmware from the manufacturer, running the way they want.Posted Aug 30, 2020 12:32 UTC (Sun) by james (subscriber, #1325) [Link]
So the choice of filesystem is up to them: they also get to specify which filesystem features are enabled and which software is provided.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 2:52 UTC (Mon)
by jra (subscriber, #55261)
[Link]
Posted Aug 31, 2020 2:52 UTC (Mon) by jra (subscriber, #55261) [Link]
We've run on systems that are case-insensitive forever.
All you need to is tell Samba via the smb.conf that the system is case insensitive, and so doing a [l]stat given a name should alway succeed if the file exists. On ENOENT we then don't do the expensive search. It's really that simple.
set:
case sensitive = yes
leave the rest as default and you're done.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 18:06 UTC (Sat)
by mariofutire (guest, #141044)
[Link] (1 responses)
Posted Aug 29, 2020 18:06 UTC (Sat) by mariofutire (guest, #141044) [Link] (1 responses)
No application can rely on it, as it depends on the way the user has created / mounted the filesystem.
So now it is even more confusing because apps need to run in an environment which is a moving target, implementing fallback or workarounds.
It is a feature only useful to individual users, not to the community as a whole in my opinion.
If this is solving a wine problem, it could have been done with a system call or via a new parameter, which they (i.e. wine) can control.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 12:45 UTC (Mon)
by kevincox (guest, #93938)
[Link]
Posted Aug 31, 2020 12:45 UTC (Mon) by kevincox (guest, #93938) [Link]
Applications can check for the feature, and refuse to run or switch to a fallback. I agree that it will be many years until this can be assumed to be available (starting counting form when the default is switched to enabled) but already programs where the performance matters can start checking for this.
> If this is solving a wine problem, it could have been done with a system call or via a new parameter, which they (i.e. wine) can control.
It can't be done this way with good performance. I think the only way that it could be done is if wine stored files normalized, but kept the original name stashed somewhere. However this is a lot of complexity and doesn't work for directories not completely managed by wine.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 18:50 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link]
Posted Aug 29, 2020 18:50 UTC (Sat) by NYKevin (subscriber, #129325) [Link]
Not true, see https://www.unicode.org/policies/stability_policy.html, which specifically notes that case-folding is stable from Unicode 5.0 onwards. It may change, but it will not change in a way that would be "noticeable" to strings only containing characters from a previous version of Unicode.
> User-space software can still not rely on the feature being available, so software would need to check whether the feature is supported and implement a fallback if it is not.
This has been used as an argument against every new feature of every piece of software since the dawn of time.
> It will not be used widely and therefore not sufficiently tested; so the feature will break silently at one point in the future.
There are a good half-dozen comments just on this article explaining why people want this feature, in a variety of contexts (mostly relating to Windows interoperability). Maybe you don't use Windows, but it's ridiculous to claim that Windows is "not used widely."
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 10:50 UTC (Sat)
by tilt12345678 (subscriber, #126336)
[Link]
Posted Aug 29, 2020 10:50 UTC (Sat) by tilt12345678 (subscriber, #126336) [Link]
And for that purpose, to enable it as an option on a Linux-hosted fileshare, i welcome case-insensitivity; it provides a very useful feature in hybrid environments.
Thanks for the hard work.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 29, 2020 16:09 UTC (Sat)
by marcH (subscriber, #57642)
[Link] (1 responses)
Posted Aug 29, 2020 16:09 UTC (Sat) by marcH (subscriber, #57642) [Link] (1 responses)
Oh, wait... https://github.com/microsoft/vscode-cmake-tools/issues/531
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 4, 2020 12:05 UTC (Fri)
by riking (subscriber, #95706)
[Link]
Posted Sep 4, 2020 12:05 UTC (Fri) by riking (subscriber, #95706) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 14:05 UTC (Sun)
by gray_-_wolf (subscriber, #131074)
[Link] (6 responses)
Posted Aug 30, 2020 14:05 UTC (Sun) by gray_-_wolf (subscriber, #131074) [Link] (6 responses)
What about same looking but technically different characters?
I just wonder what the line here is.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 30, 2020 18:32 UTC (Sun)
by khim (subscriber, #9252)
[Link] (5 responses)
Posted Aug 30, 2020 18:32 UTC (Sun) by khim (subscriber, #9252) [Link] (5 responses)
The line is where Windows draws the line, ultimately.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 4:16 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (4 responses)
Posted Aug 31, 2020 4:16 UTC (Mon) by marcH (subscriber, #57642) [Link] (4 responses)
Thanks to this and other comments I finally understand what this feature truly is: _Windows compatibility_. It should really be called like that instead of "case-insensitive filesystem" that never made much technical sense because of all the possible variations, configurations, evolutions, incompatibilities, bugs, complexity and other corner cases. Case is a natural language and informal concept after all, for instance most French people believe the capital letter for "é" is "E" (no accent) while all professional newspapers and books use "É" (the latter is very difficult to enter on Windows, which partly explains the former)
"The Windows implementation is the specification" finally does make some sense. I mean it still doesn't make sense but at least it provides a "technical" and "formal" specification for it. Don't forget to include the Windows and Unicode version numbers in the feature name too and maybe the Windows locale too.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 8:12 UTC (Mon)
by abo (subscriber, #77288)
[Link]
Posted Aug 31, 2020 8:12 UTC (Mon) by abo (subscriber, #77288) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 15:44 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Aug 31, 2020 15:44 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (2 responses)
If that's the case, are all of the other Windows filenaming rules also being enforced? No trailing spaces, periods, special names, etc.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 16:26 UTC (Mon)
by paultaysom (guest, #141070)
[Link]
Posted Aug 31, 2020 16:26 UTC (Mon) by paultaysom (guest, #141070) [Link]
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 3, 2020 14:52 UTC (Thu)
by khim (subscriber, #9252)
[Link]
Posted Sep 3, 2020 14:52 UTC (Thu) by khim (subscriber, #9252) [Link]
It's Linux we are talking about. So obviously the old rule if nobody notices, it's not broken is in effect.
The goal is not to faithfully reproduce Windows behavior, the goal is to make all these billions of lines of code written for Windows useful.
And while there are enormous corpus of which creates "SomeDataFile.DAT" and then tried to read "somedatafile.data"... all these special names are rarely a problem in practice.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Aug 31, 2020 4:20 UTC (Mon)
by marcH (subscriber, #57642)
[Link]
Posted Aug 31, 2020 4:20 UTC (Mon) by marcH (subscriber, #57642) [Link]
git clone linux
cd linux
git status
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: include/uapi/linux/netfilter/xt_CONNMARK.h
modified: include/uapi/linux/netfilter/xt_DSCP.h
modified: include/uapi/linux/netfilter/xt_MARK.h
....
modified: net/netfilter/xt_DSCP.c
modified: net/netfilter/xt_HL.c
modified: net/netfilter/xt_RATEEST.c
modified: net/netfilter/xt_TCPMSS.c
Back to a case-sensitive system:
git ls-tree --name-only v5.7 - net/netfilter/xt_* | sort -f -k4
net/netfilter/xt_dscp.c
net/netfilter/xt_DSCP.c !!!
net/netfilter/xt_ht.c
net/netfilter/xt_HT.c
etc.
Who thought this was a good idea?
I don't know if this serves a purpose
Posted Aug 31, 2020 10:55 UTC (Mon)
by xophos (subscriber, #75267)
[Link]
Posted Aug 31, 2020 10:55 UTC (Mon) by xophos (subscriber, #75267) [Link]
Applying the logic stated to it's conclusion "Report, important" should also be the same File. So we need a Dictionary and an AI to determine which filenames are the same.
If that is to far fetched for you just consider unicode code-points that have the same or similar looking characters attached.
Those should clearly be the same too!
The way i see it the only usecase is easier windows emulation. If that is worth the effort is debatable, but at least be honest about it.
Krisman: Using the Linux kernel's Case-insensitive feature in Ext4
Posted Sep 4, 2020 2:42 UTC (Fri)
by RogerOdle (subscriber, #60791)
[Link]
Posted Sep 4, 2020 2:42 UTC (Fri) by RogerOdle (subscriber, #60791) [Link]
For software development, the file system should assure that there is only one way to spell something and it should be an error if you use the wrong case.