diff --git a/core/docs/Changelog.md b/core/docs/Changelog.md index ebdbffca7a..5ff672c43e 100644 --- a/core/docs/Changelog.md +++ b/core/docs/Changelog.md @@ -4,6 +4,9 @@ * Breaking: In `FileSystem.Path` module the default for `eqPath` changed on Windows to case-sensitive comparison. +* Breaking: A leading "." component (e.g. "." or "./x") is no longer + treated as a rooted path, making the behavior more in line with + intuitive expectation. * Breaking: In `FileSystem.Path` module the default for `eqPath` changed on both Posix and Windows so that `allowRelativeEquality` is `True` by default. Literally identical relative paths (e.g. `./x` and `./x`, or diff --git a/core/src/Streamly/Internal/FileSystem/Path.hs b/core/src/Streamly/Internal/FileSystem/Path.hs index 7dffead175..f6ba8049d7 100644 --- a/core/src/Streamly/Internal/FileSystem/Path.hs +++ b/core/src/Streamly/Internal/FileSystem/Path.hs @@ -6,138 +6,35 @@ -- Maintainer : streamly@composewell.com -- Portability : GHC -- +-- See docs/Developer/FileSystem.Path.md for design doc. +-- -- The API in this module is equivalent to or can emulate all or most of -- the filepath package API. It has some differences from the filepath -- package: -- --- 1. Empty paths are not allowed. Paths are validated before construction. --- 2. The default Path type itself affords considerable safety regarding the +-- 1. The append operations follows path construction semantics rather than +-- path resolution and navigation based semantics used by the operation in +-- filepath package. Better have run time failures instead of silent problems. +-- 2. Empty paths are not allowed. Paths are validated before construction. +-- 3. The default Path type itself affords considerable safety regarding the -- distinction of rooted or non-rooted paths, it also allows distinguishing -- directory and file paths. --- 3. It is designed to provide flexible typing to provide compile time safety +-- 4. It is designed to provide flexible typing to provide compile time safety -- for rooted/non-rooted paths and file/dir paths. The Path type is just part -- of that typed path ecosystem. Though the default Path type itself should be -- enough for most cases. --- 4. It leverages the streamly array module for most of the heavy lifting, +-- 5. It leverages the streamly array module for most of the heavy lifting, -- it is a thin wrapper on top of that, improving maintainability as well as -- providing better performance. We can have pinned and unpinned paths, also -- provide lower level operations for certain cases to interact more -- efficiently with low level code. +-- 6. share name is part of the root when we split the root, this allows us to +-- treat the server and share name always in cases insensitive manner and the +-- remaining path can be normalized as case sensitive or insensitive. -- -- It builds on arrays, has a richer API, consistent API, streaming ops where -- it makes sense, performance is primary goal. -- --- == References --- --- * https://en.wikipedia.org/wiki/Path_(computing) --- * https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file --- * https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/62e862f4-2a51-452e-8eeb-dc4ff5ee33cc --- --- == Windows and Posix Paths --- --- We should be able to manipulate windows paths on posix and posix paths on --- windows as well. Therefore, we have WindowsPath and PosixPath types which --- are supported on both platforms. However, the Path module aliases Path to --- WindowsPath on Windows and PosixPath on Posix. --- --- == File System as Tree vs Graph --- --- A file system is a tree when there are no hard links or symbolic links. But --- in the presence of symlinks it could be a DAG or a graph, because directory --- symlinks can create cycles. --- --- == Rooted and Branch paths --- --- We make two distinctions for paths, a path may a specific filesystem root --- attached to it or it may be a free branch without a root attached. --- --- A path that has a root attached to it is called a rooted path e.g. /usr is a --- rooted path, . is a rooted path, ./bin is a rooted path. A rooted path could --- be absolute e.g. /usr or it could be relative e.g. ./bin . A rooted path --- always has two components, a specific "root" which could be explicit or --- implicit, and a path segment relative to the root. A rooted path with a --- fixed root is known as an absolute path whereas a rooted path with an --- implicit root e.g. "./bin" is known as a relative path. --- --- A path that does not have a root attached but defines steps to go from some --- place to another is a path branch. For example, "local/bin" is a path branch --- whereas "./local/bin" is a rooted path. --- --- Rooted paths can never be appended to any other path whereas a branch can be --- appended. --- --- The rooted/unrooted path concept is especially useful on windows. Windows is --- different in that C:x is curdir relative path, /x is curdrive relative path. --- Even though these paths are relative they cannot be appended to other paths. --- The only relative path that can appended is "./x". Ideally, we should be --- able to append C:x and C:y to C:x/y if we treat them as ./x and ./y but we --- can't, only "." has that treatement that it can be removed and made a path --- segment. --- --- == Comparing Paths --- --- We can compare two absolute rooted paths or path branches but we cannot --- compare two relative rooted paths. If each component of the path is the same --- then the paths are considered to be equal. --- --- == Implicit Roots (.) --- --- On Posix and Windows "." implicitly refers to the current directory. On --- Windows a path like @/Users/@ has the drive reference implicit. Such --- references are contextual and may have different meanings at different --- times. --- --- @./bin@ may refer to a different location depending on what "." is --- referring to. Thus we should not allow @./bin@ to be appended to another --- path, @bin@ can be appended though. Similarly, we cannot compare @./bin@ --- with @./bin@ and say that they are equal because they may be referring to --- different locations depending on in what context the paths were created. --- --- The same arguments apply to paths with implicit drive on Windows. --- --- We can treat @.\/bin\/ls@ as an absolute path with "." as an implicit root. --- The relative path is "bin/ls" which represents steps from somewhere to --- somewhere else rather than a particular location. We can also call @./bin@ --- as a "rooted path" as it starts from particular location rather than --- defining "steps" to go from one place to another. If we want to append such --- paths we need to first make them explicitly relative by dropping the --- implicit root. Or we can use unsafeAppend to force it anyway or unsafeCast --- to convert absolute to relative. --- --- On these absolute (Rooted) paths if we use takeRoot, it should return --- RootCurDir, RootCurDrive and @Root Path@ to distinguish @./@, @/@, @C:/@. We --- could represent them by different types but that would make the types even --- more complicated. So runtime checks are are a good balance. --- --- Path comparison should return EqTrue, EqFalse or EqUnknown. If we compare --- these absolute/located paths having implicit roots then result should be --- EqUnknown or maybe we can just return False?. @./bin@ and @./bin@ should be --- treated as paths with different roots/drives but same relative path. The --- programmer can explicitly drop the root and compare the relative paths if --- they want to check literal equality. --- --- Note that a trailing . or a . in the middle of a path is different as it --- refers to a known name. --- --- == Ambiguous References (..) --- --- ".." in a path refers to the parent directory relative to the current path. --- For an absolute root directory ".." refers to the root itself because you --- cannot go further up. --- --- When resolving ".." it always resolves to the parent of a directory as --- stored in the directory entry. So if we landed in a directory via a symlink, --- ".." can take us back to a different directory and not to the symlink --- itself. Thus @a\/b/..@ may not be the same as @a/@. Shells like bash keep --- track of the old paths explicitly, so you may not see this behavior when --- using a shell. --- --- For this reason we cannot process ".." in the path statically. However, if --- the components of two paths are exactly the same then they will always --- resolve to the same target. But two paths with different components could --- also point to the same target. So if there are ".." in the path we cannot --- definitively say if they are the same without resolving them. --- -- == Exception Handling -- -- Path creation routines use MonadThrow which can be interpreted as an Either diff --git a/core/src/Streamly/Internal/FileSystem/Path/Common.hs b/core/src/Streamly/Internal/FileSystem/Path/Common.hs index 642274a74a..6f2221db19 100644 --- a/core/src/Streamly/Internal/FileSystem/Path/Common.hs +++ b/core/src/Streamly/Internal/FileSystem/Path/Common.hs @@ -6,6 +6,8 @@ -- Maintainer : streamly@composewell.com -- Portability : GHC -- +-- See docs/Developer/FileSystem.Path.md for design doc. +-- module Streamly.Internal.FileSystem.Path.Common ( -- * Types @@ -13,7 +15,6 @@ module Streamly.Internal.FileSystem.Path.Common -- * Validation , validatePath - , validatePath' , validateFile -- * Construction @@ -349,214 +350,6 @@ hasDrive a = Array.length a >= 2 && unsafeHasDrive a isDrive :: (Unbox a, Integral a) => Array a -> Bool isDrive a = Array.length a == 2 && unsafeHasDrive a ------------------------------------------------------------------------------- --- Relative or Absolute Paths ------------------------------------------------------------------------------- --- --- Relative (no external state except cwd) --- RelFree -- x, ./x -- both curdir and curdrive are unspecified --- --- AnchPath -- partially relative, dir or drive are specified (Windows only) --- Anchored paths can be classified into two categories: --- AnchorDrv, AnchorDir? --- RelCurDirOnly -- C:x -- drive specified, path relative to current dir on that drive --- RelCurDriveOnly -- \x -- absolute path on current drive (root-relative) --- --- AbsPath -- fully anchored, both drive and dir are specified --- AbsDrive -- C:\x --- AbsUNC -- \\server\share\x --- AbsDevice -- \\?\..., \Device\... --- --- On Posix only these categories exist: --- RelPath : RelFree --- AbsPath : AbsDrive ("/x") --- AnchPath : None --- --- data PathType --- = AbsPath --- | RelPath --- | AnchPath --- --- When appending, do not insert a separator after a bare drive (C:). For all --- practical purposes a bare "C:" can be treated as "C:." and then we do not --- need this special treatement wrt separators. --- --- C: x -> C:x --- --------------------------------------------------------------------------------- --- PATH NAVIGATION SEMANTICS (follow) --------------------------------------------------------------------------------- --- --- "follow" navigates first path followed by the second. In other words, --- "follow p1 p2" interprets p2 in the context of p1. --- --- Operationally: --- cd (follow p1 p2) == cd p1; cd p2 --- --- That is, p2 is resolved relative to the location denoted by p1. --- The two paths denote a sequence of resolution operations, we resolve p1 and --- then we resolve p2 with respect to p1. --- --- Note that this operation is total and never results in an error. --- --- Rules: --- --- 1. If p2 is Relative: --- --- Absolute Relative -> Absolute --- Relative Relative -> Relative --- Anchored Relative -> Anchored --- --- (p2 is appended to p1) --- --- 2. If p2 is Absolute: --- --- Any Absolute -> Absolute (p2 wins) --- --- 3. If p2 is RelCurDirOnly (C:y), if the drive is the same then combine --- otherwise take the second path. If the drive is not specified then it is --- considered to be different. --- --- C: C:y -> C:y -- C: equiv C:. --- C:x C:y -> C:x/y --- C:/x C:y -> C:/x/y --- D:x C:y -> C:y --- --- /x C:y -> C:y --- x C:y -> C:y --- --- The "cd" semantics can be incorrect for the last two if we assume the --- drive of the first path to be same as the second. --- --- 4. If p2 is RelCurDriveOnly (\y), discard LHS, if LHS has drive keep the drive: --- --- C: \y -> C:\y -- C: equiv C:. --- C:/ \y -> C:\y --- C:/x \y -> C:\y --- C:x \y -> C:\y --- \x \y -> \y --- x \y -> \y --- --- For the first 3 cases above, UNC behaves the same as a drive root: --- --- \\server\share\x \y -> \\server\share\y --- --- These are based on how python 'ntpath' module behaves. --- --------------------------------------------------------------------------------- --- PATH CONSTRUCTION SEMANTICS (append) --------------------------------------------------------------------------------- --- --- append constructs paths structurally. The second argument must be such that --- it can be interpreted relative to the first. While "follow" is total, --- "append" is partial and can result in runtime errors. --- --- append p r extends p with the segments of r. --- --- Rules: --- --- 1. Always valid if r is relative: --- --- appendAbs :: AbsPath -> RelPath -> AbsPath --- appendRel :: RelPath -> RelPath -> RelPath --- appendAnch :: AnchPath -> RelPath -> AnchPath --- --- 2. Never valid if r is AbsPath: --- --- / /x -> error -- can be allowed, but no exception --- p AbsPath -> error --- --- 3. Identity: --- --- "." is the empty relative path, it is identity of composition: --- --- appendAbs p "." == p --- appendRel p "." == p --- appendRel "." p == p --- appendAnch p "." == p --- --- 4. Associativity (via RelPath): --- --- append (append p a) b == append p (a <> b) --- --- Notes: --- --- - "." is not an anchor; it is the identity element of relative paths. --- - On Windows AnchoredPath can only start with "\" or "C:", it cannot start --- with "C:\" as that would make it an AbsPath. --- --------------------------------------------------------------------------------- --- Handling Anchored Paths --------------------------------------------------------------------------------- --- --- If second path is Anchored, and has the same Anchor as the first path, then --- strip the Anchor into a Maybe Drive and a Relative or / Anchored path and --- then apply the same rules as above considering the / Anchored path as --- absolute. --- --- 1. If p2 is RelCurDirOnly (C:y) (Anchored), if both the paths have drive and --- it is the same then combine otherwise it is runtime error. --- --- C:\x C:y -> C:\x\y --- C: C:y -> C:y -- C: equiv C:. --- C:x C:y -> C:x\y --- --- D:x C:y -> error --- \x C:y -> error --- x C:y -> error --- --- 2. If p2 is RelCurDriveOnly (\y) (Anchored). p2 is absolute within the --- drive, therefore, similar to the absolute path rules, not allowed. --- --- C:\ \y -> error -- can be allowed, but no exceptions --- C:\x \y -> error --- C: \y -> error -- C: is equiv C:. which is a relative path --- C:x \y -> error --- \x \y -> error --- x \y -> error --- --- For the first 3 cases above, UNC behaves the same as a drive root: --- --------------------------------------------------------------------------------- --- Typed paths --------------------------------------------------------------------------------- --- --- Types: --- --- appendAbs :: AbsPath -> RelPath -> AbsPath --- appendRel :: RelPath -> RelPath -> RelPath --- --- They can be combined into a single operation using an IsPath typeclass. --- --- Windows specific: --- --- appendAnch :: AnchPath -> RelPath -> AnchPath --- --- To append Anchored paths remove the anchor first: --- --- splitAnchor :: AnchPath -> (Maybe Drive, p) --- --- where p is either RelPath or AnchPath (e.g. /x) type. --- --- combineAnch :: AnchPath -> AnchPath -> Maybe AnchPath --- splitAnchor -> --- if both have a common drive --- then --- if second path splits to (_, RelPath) --- then Just --- else Nothing --- else Nothing --- --------------------------------------------------------------------------------- --- SUMMARY --------------------------------------------------------------------------------- --- --- follow = resolution (contextual, may override) --- append = construction (structural, no override) --- --- follow models filesystem navigation semantics --- append models path construction semantics - -- | A path relative to cur dir i.e. either equal to @.@ or starts with @./@. -- It has a leading dot component. isRelativeCurDir :: (Unbox a, Integral a) => OS -> Array a -> Bool @@ -642,24 +435,19 @@ isAbsolute Windows arr = ------------------------------------------------------------------------------ -- Note: paths starting with . or .. are ambiguous and can be considered --- segments or rooted. We consider a path starting with "." as rooted, when --- someone uses "./x" they explicitly mean x in the current directory whereas --- just "x" can be taken to mean a path segment without any specific root. --- However, in typed paths the programmer can convey the meaning whether they --- mean it as a segment or a rooted path. So even "./x" can potentially be used --- as a segment which can just mean "x". --- --- XXX For the untyped Path we can allow appending "./x" to other paths. We can --- leave this to the programmer. In typed paths we can allow "./x" in segments. +-- segments or rooted. A leading "." is treated as just a path segment that +-- happens to refer to the current directory; "./x" is equivalent to "x" and +-- both are considered unrooted. A path is only rooted if it has an explicit +-- absolute or drive-style root (leading separator, drive letter, share name, +-- etc.). +-- -- XXX Empty path can be taken to mean "." except in case of UNC paths isRooted :: (Unbox a, Integral a) => OS -> Array a -> Bool isRooted Posix a = hasLeadingSeparator Posix a - || isRelativeCurDir Posix a isRooted Windows a = hasLeadingSeparator Windows a - || isRelativeCurDir Windows a || hasDrive a -- curdir-in-drive relative, drive absolute isBranch :: (Unbox a, Integral a) => OS -> Array a -> Bool @@ -895,6 +683,13 @@ unsafeSplitUNC arr = {-# INLINE splitRoot #-} splitRoot :: (Unbox a, Integral a) => OS -> Array a -> (Array a, Array a) +-- NOTE: 'splitRoot' is an internal structural operation; for the purposes of +-- equality/normalisation it still splits a leading "." component (e.g. "./x") +-- off as a root so that the 'allowRelativeEquality' machinery can distinguish +-- "./x" from "x". Note that 'isRooted' does /not/ classify such a path as +-- rooted; the user-facing 'splitRoot' wrappers in 'PosixPath'/'WindowsPath' +-- map this case to 'Nothing'. +-- -- NOTE: validatePath depends on splitRoot splitting the path without removing -- any redundant chars etc. It should just split and do nothing else. -- XXX We can put an assert here "arrLen == rootLen + stemLen". @@ -903,7 +698,7 @@ splitRoot :: (Unbox a, Integral a) => OS -> Array a -> (Array a, Array a) -- NOTE: we cannot drop the trailing "/" on the root even if we want to - -- because "c:/" will become "c:" and the two are not equivalent. splitRoot Posix arr - | isRooted Posix arr + | isRooted Posix arr || isRelativeCurDir Posix arr = unsafeSplitLeadingSep Posix arr | otherwise = (Array.empty, arr) splitRoot Windows arr @@ -1030,6 +825,8 @@ splitPath_ => OS -> Array a -> Stream m (Array a) splitPath_ = splitPathUsing False False +-- XXX It should not normalize the Windows verbatim paths. We should simply +-- split it as it is. {-# INLINE splitPath #-} splitPath :: (Unbox a, Integral a, Monad m) @@ -1376,10 +1173,41 @@ isInvalidPathComponent = fmap (fmap charToWord) , "LPT1","LPT2","LPT3","LPT4","LPT5","LPT6","LPT7","LPT8","LPT9" ] +{- +A valid path is either: +- a valid absolute path +- a valid relative path + +A valid absolute path consists of: +- a valid absolute root +- optionally followed by a valid relative path + +A valid relative path: +- does not begin with an absolute root +- on Windows may begin with a relative root ("/" or "c:") +- consists of path segments separated by path separators +- contains no disallowed characters in any segment + +Note: A leading "." (e.g. "." or "./x") is /not/ treated as a root; "." is +an ordinary path segment referring to the current directory. + +A valid absolute root: +- is a valid path +- has no parent + +Note: +- On Windows, "/" is a drive-relative root, not an absolute root. +- On Windows, "c:" is drive-relative, while "c:/" is absolute. + +A generic way to validate a path is -- split it lexically on the separator and +then examine each component including the trailing separator. +-} + +-- | A valid root, share root or a valid path. {- HLINT ignore "Use when" -} -validatePathWith :: (MonadThrow m, Integral a, Unbox a) => - Bool -> OS -> Array a -> m () -validatePathWith _ Posix path = +validatePath :: (MonadThrow m, Integral a, Unbox a) => + OS -> Array a -> m () +validatePath Posix path = let pathLen = Array.length path validLen = countLeadingValid Posix path in if pathLen == 0 @@ -1388,7 +1216,7 @@ validatePathWith _ Posix path = then throwM $ InvalidPath $ "Null char found after " ++ show validLen ++ " characters." else pure () -validatePathWith _allowRoot Windows path +validatePath Windows path | Array.null path = throwM $ InvalidPath "Empty path" | otherwise = do if hasDrive path && postDriveSep > 1 -- "C://" @@ -1482,15 +1310,6 @@ validatePathWith _allowRoot Windows path invalidComponent = List.any (`List.elem` isInvalidPathComponent) (components stem) --- | A valid root, share root or a valid path. -{-# INLINE validatePath #-} -validatePath :: (MonadThrow m, Integral a, Unbox a) => OS -> Array a -> m () -validatePath = validatePathWith True - -{-# INLINE validatePath' #-} -validatePath' :: (MonadThrow m, Integral a, Unbox a) => OS -> Array a -> m () -validatePath' = validatePathWith False - {-# INLINE unsafeFromArray #-} unsafeFromArray :: Array a -> Array a unsafeFromArray = id @@ -1701,43 +1520,7 @@ joinRootBody os root body | otherwise = doAppend os root body ------------------------------------------------------------------------------ --- Normalization and comparison of paths ------------------------------------------------------------------------------- - --- Windows literal paths --- --------------------- --- --- Windows "Literal" Paths (\\?\): When you prefix a path with \\?\, you are --- telling the Windows APIs to turn off all "normalization". --- --- Object Manager Paths: On Windows, paths like \??\C:\ or --- \Device\HarddiskVolume1\ have very specific rules about separators. --- --- We should splitPathRaw instead of splitPath on such paths to be able to --- reconstruct the path back if needed. --- --- POSIX // --- -------- --- --- On POSIX a path starting with exactly two slashes ("//x") is --- implementation-defined. --- --- See https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html --- --- If a pathname begins with two successive characters, the first --- component following the leading characters may be interpreted in an --- implementation-defined manner, although more than two leading --- characters shall be treated as a single character. --- --- This is rarely or historically used on Posix but may be of importance in --- portable cygwin style paths where a UNC path \\server\share\file gets --- converted to Posix style //server/share/file . --- --- If we want this behavior on Posix we can treat the path as a Windows path --- and use Windows path operations on it. - ------------------------------------------------------------------------------- --- Building blocks +-- Normalization: Building blocks ------------------------------------------------------------------------------ -- NOTE: splitPath already cleans up redundant separators and dot components, diff --git a/core/src/Streamly/Internal/FileSystem/PosixPath.hs b/core/src/Streamly/Internal/FileSystem/PosixPath.hs index c2f7d14393..50eed09942 100644 --- a/core/src/Streamly/Internal/FileSystem/PosixPath.hs +++ b/core/src/Streamly/Internal/FileSystem/PosixPath.hs @@ -6,6 +6,7 @@ -- file, generating a haddock warning. -- -- See Internal.FileSystem.Path for module level docs. +-- See docs/Developer/FileSystem.Path.md for design doc. -- -- This file is preprocessed and included in Internal.FileSystem.Path module. -- The preprocessor replaces the macros by OS specific values. OS_PATH_TYPE @@ -560,8 +561,12 @@ addTrailingSeparator p@(OS_PATH _arr) = -- False -- >>> isValid "\\\\x\\" -- False +-- >>> isValid "\\\\server\\" +-- False -- >>> isValid "\\\\x\\y" -- True +-- >>> isValid "\\\\server\\x" +-- True -- >>> isValid "//x/y" -- True -- >>> isValid "\\\\prn\\y" @@ -596,6 +601,8 @@ addTrailingSeparator p@(OS_PATH _arr) = -- True -- >>> isValid "\\\\?\\UNC\\x" -- server x but no share -- False +-- >>> isValid "\\\\?\\UNC\\server" +-- False -- >>> isValid "\\\\?\\UNC\\c:\\x" -- True -- @@ -896,9 +903,9 @@ AS_OS_CSTRING p = Array.asNullTerminatedPtr (toArray p) ------------------------------------------------------------------------------ #ifndef IS_WINDOWS --- | A path that is attached to a root e.g. "\/x" or ".\/x" are rooted paths. --- "\/" is considered an absolute root and "." as a dynamic root. ".." is not --- considered a root, "..\/x" or "x\/y" are not rooted paths. +-- | A path that is attached to a root e.g. "\/x" is a rooted path. "\/" is an +-- absolute root. On Posix a rooted path is same as an absolute path. A rooted +-- path cannot be appended to any other path on Posix. -- -- >>> isRooted = Path.isRooted . Path.fromString_ -- @@ -906,25 +913,30 @@ AS_OS_CSTRING p = Array.asNullTerminatedPtr (toArray p) -- True -- >>> isRooted "/x" -- True --- >>> isRooted "." --- True --- >>> isRooted "./x" --- True -- isRooted :: OS_PATH_TYPE -> Bool isRooted (OS_PATH arr) = Common.isRooted Common.OS_NAME arr #endif --- | A path that is not attached to a root e.g. @a\/b\/c@ or @..\/b\/c@. +-- | A path that is not attached to a root. An unrooted path can always be +-- appended to any other path. +-- +-- Definition: -- -- >>> isUnrooted = not . Path.isRooted -- +-- Examples: +-- -- >>> isUnrooted = Path.isUnrooted . Path.fromString_ -- -- >>> isUnrooted "x" -- True -- >>> isUnrooted "x/y" -- True +-- >>> isUnrooted "." +-- True +-- >>> isUnrooted "./x" +-- True -- >>> isUnrooted ".." -- True -- >>> isUnrooted "../x" @@ -1067,32 +1079,34 @@ unsafeJoinPaths = undefined -- >>> split "/" -- Just ("/",Nothing) -- --- >>> split "." --- Just (".",Nothing) --- --- >>> split "./" --- Just ("./",Nothing) --- -- >>> split "/home" -- Just ("/",Just "home") -- -- >>> split "//" -- Just ("//",Nothing) -- +-- A leading @.@ component is not treated as a root: +-- +-- >>> split "." +-- Nothing +-- +-- >>> split "./" +-- Nothing +-- -- >>> split "./home" --- Just ("./",Just "home") +-- Nothing -- -- >>> split "home" -- Nothing -- splitRoot :: OS_PATH_TYPE -> Maybe (OS_PATH_TYPE, Maybe OS_PATH_TYPE) -splitRoot (OS_PATH x) = - let (a,b) = Common.splitRoot Common.OS_NAME x - in if Array.null a - then Nothing - else if Array.null b - then Just (OS_PATH a, Nothing) - else Just (OS_PATH a, Just (OS_PATH b)) +splitRoot (OS_PATH x) + | not (Common.isRooted Common.OS_NAME x) = Nothing + | otherwise = + let (a,b) = Common.splitRoot Common.OS_NAME x + in if Array.null b + then Just (OS_PATH a, Nothing) + else Just (OS_PATH a, Just (OS_PATH b)) -- | Split the path components keeping separators between path components -- attached to the dir part. Redundant separators are removed, only the first @@ -1430,8 +1444,12 @@ pathDepth (OS_PATH p) = runIdentity $ Stream.fold Fold.length $ Common.splitPath_ Common.OS_NAME p - -- XXX should be just (n-1) when rooted, n can never be <= 0 - in if Common.isRooted Common.OS_NAME p then max 0 (n - 1) else n + -- splitPath_ produces a leading root segment for any path whose + -- structural splitRoot is non-empty (this includes a leading "."). + -- Subtract one for that segment so a path like "." or "./x" has + -- depth 0 or 1 respectively, not 1 or 2. + (root, _) = Common.splitRoot Common.OS_NAME p + in if Array.null root then n else max 0 (n - 1) -- | Extracts the file name component (with extension) from a OS_PATH_TYPE, if -- present. @@ -1636,7 +1654,9 @@ allowRelativeEquality val conf = conf { _allowRelativeEquality = val } -- True -- -- Relative paths compare equal by default; pass --- @'allowRelativeEquality' False@ to require both paths to be absolute: +-- @'allowRelativeEquality' False@ to require both paths to be absolute. A +-- leading @.\/@ is just a redundant @.@ segment and compares equal to the +-- same path without it: -- -- >>> eq "." "." -- True @@ -1644,6 +1664,10 @@ allowRelativeEquality val conf = conf { _allowRelativeEquality = val } -- True -- >>> eq "./x" "x" -- True +-- >>> eq "./.." ".." +-- True +-- >>> eq "./../x" "../x" +-- True -- -- Trailing separators are significant by default: -- @@ -1866,10 +1890,13 @@ takeCommonPrefix cfg (OS_PATH a) (OS_PATH b) = -- NoCommonPrefix, and NotProperPrefix. #ifndef IS_WINDOWS --- | Strip a prefix from a path at a path segment boundary. Returns the --- remaining suffix if the first argument is a prefix of the second, or --- 'Nothing' if it is not or if stripping the prefix leaves an empty remainder --- (i.e. the prefix equals the full path). +-- | If all the components of the prefix path match the leading components +-- of the second path, strip those components from the second path and +-- return 'Just' the remainder, otherwise return 'Nothing'. If there is +-- no remainder then return 'Nothing'. +-- +-- This function essentially makes the second path relative to the first except +-- that it does not introduce ".." components. -- -- The prefix is compared using the supplied 'EqCfg' normalisation: redundant -- separators and @.@ components are removed before matching. @..@ components @@ -1880,29 +1907,29 @@ takeCommonPrefix cfg (OS_PATH a) (OS_PATH b) = -- >>> f "/x" "/x/y/z" -- Just "y/z" -- --- >>> f "/x/y" "/x/y/z" --- Just "z" +-- >>> f "/" "/x" +-- Just "x" -- -- Prefix not present: -- --- >>> f "/a" "/x/y" +-- >>> f "/x" "/y" -- Nothing -- --- Prefix equals full path, leaving empty remainder: +-- Both the paths are equal: -- -- >>> f "/x/y" "/x/y" -- Nothing -- --- Redundant separators in the prefix are normalised before matching: +-- Redundant separators are normalised: -- -- >>> f "/x//y" "/x/y/z" -- Just "z" -- #else --- | Strip a prefix from a path at a path segment boundary. Returns the --- remaining suffix if the first argument is a prefix of the second, or --- 'Nothing' if it is not or if stripping the prefix leaves an empty remainder --- (i.e. the prefix equals the full path). +-- | If all the components of the prefix path match the leading components +-- of the second path, strip those components from the second path and +-- return 'Just' the remainder, otherwise return 'Nothing'. If there is +-- no remainder then return 'Nothing'. -- -- The prefix is compared using the supplied 'EqCfg' normalisation. The drive -- letter is matched case-insensitively. Verbatim @\\\\?\\@ paths are matched @@ -1913,14 +1940,22 @@ takeCommonPrefix cfg (OS_PATH a) (OS_PATH b) = -- >>> f "C:\\x" "C:\\x\\y\\z" -- Just "y\\z" -- --- Drive letter case differs but the drive matches: +-- Drive letter case differs: -- --- >>> f "c:\\x" "C:\\x\\y" --- Just "y" +-- >>> f "c:\\" "C:\\x" +-- Just "x" -- -- Prefix not present: -- --- >>> f "C:\\a" "C:\\x\\y" +-- >>> f "C:\\a" "C:\\x" +-- Nothing +-- +-- >>> f "C:" "C:\\x" +-- Nothing +-- +-- Both the paths are equal: +-- +-- >>> f "/x/y" "/x/y" -- Nothing -- #endif diff --git a/core/src/Streamly/Internal/FileSystem/PosixPath/Seg.hs b/core/src/Streamly/Internal/FileSystem/PosixPath/Seg.hs index b84e45c1b0..fc3e7e60d8 100644 --- a/core/src/Streamly/Internal/FileSystem/PosixPath/Seg.hs +++ b/core/src/Streamly/Internal/FileSystem/PosixPath/Seg.hs @@ -18,16 +18,22 @@ -- Portability : GHC -- -- This module provides a type safe path append operation by distinguishing --- paths between rooted paths and branches. Rooted paths are represented by the --- @Rooted OS_PATH@ type and branches are represented by the @Unrooted OS_PATH@ --- type. Rooted paths are paths that are attached to specific roots in the file --- system. Rooted paths could be absolute or relative e.g. @\/usr\/bin@, --- @.\/local\/bin@, or @.@. Unrootedes are a paths that are not attached to a --- specific root e.g. @usr\/bin@, @local\/bin@, or @../bin@ are branches. +-- between rooted paths and unrooted branches. Rooted paths are represented +-- by the @Rooted OS_PATH@ type and unrooted branches are represented by the +-- @Unrooted OS_PATH@ type. +-- +-- Rooted paths are attached to specific filesystem roots or anchors. On +-- Posix, rooted paths are absolute paths such as @/usr/bin@. On Windows, +-- rooted paths may be absolute or partially constrained, for example +-- @C:\x@, @C:x@, or @\x@. +-- +-- Unrooted paths are not attached to a specific root and behave as +-- appendable path branches. Examples include @usr/bin@, @local/bin@, +-- @./bin@, and @../bin@. -- -- This distinction provides a safe path append operation which cannot fail. --- These types do not allow appending a rooted path to any other path. Only --- branches can be appended. +-- Rooted paths cannot be appended to other paths; only unrooted branches may +-- be appended. -- module Streamly.Internal.FileSystem.OS_PATH.Seg ( diff --git a/core/src/Streamly/Internal/FileSystem/WindowsPath.hs b/core/src/Streamly/Internal/FileSystem/WindowsPath.hs index 28938ed50c..1b7078eeab 100644 --- a/core/src/Streamly/Internal/FileSystem/WindowsPath.hs +++ b/core/src/Streamly/Internal/FileSystem/WindowsPath.hs @@ -2,6 +2,8 @@ #define IS_WINDOWS #include "Streamly/Internal/FileSystem/PosixPath.hs" +-- See docs/Developer/FileSystem.Path.md for design doc. +-- -- XXX Move these functions to PosixPath.hs and use CPP conditionals for -- documentation differences, definitions are identical. @@ -18,26 +20,13 @@ -- | Like 'validatePath' but more strict. Currently equivalent to -- 'validatePath' on Windows; reserved for future stricter checks. --- --- >>> isValid = isJust . Path.validatePath' . Path.encodeString --- --- >>> isValid "\\\\" --- False --- >>> isValid "\\\\server\\" --- False --- >>> isValid "\\\\server\\x" --- True --- >>> isValid "\\\\?\\UNC\\server" --- False --- +{-# DEPRECATED validatePath' "Please use 'validatePath' instead." #-} validatePath' :: MonadThrow m => Array OS_WORD_TYPE -> m () -validatePath' = Common.validatePath' Common.Windows +validatePath' = Common.validatePath Common.Windows -- | Like 'isValidPath' but more strict. --- --- >>> isValidPath' = isJust . Path.validatePath' --- +{-# DEPRECATED isValidPath' "Please use 'isValidPath' instead." #-} isValidPath' :: Array OS_WORD_TYPE -> Bool isValidPath' = isJust . validatePath' @@ -54,9 +43,10 @@ isValidPath' = isJust . validatePath' readArray :: [Char] -> OS_PATH_TYPE readArray = fromJust . fromArray . read --- | A path that is attached to a root. "C:\\" is considered an absolute root --- and "." as a dynamic root. ".." is not considered a root, "..\/x" or "x\/y" --- are not rooted paths. +-- | A path that is attached to a root. "C:\\" is considered an absolute root. +-- A leading @.@ is /not/ considered a root: "." and ".\\x" are unrooted, +-- equivalent to (an empty path and) "x". ".." is also not a root, "..\/x" or +-- "x\/y" are not rooted paths. -- -- Absolute locations: -- @@ -70,10 +60,18 @@ readArray = fromJust . fromArray . read -- Relative locations: -- -- * @\\@ relative to current drive root --- * @.\\@ relative to current directory -- * @C:@ current directory in drive -- * @C:file@ relative to current directory in drive -- +-- Note: @C:@ refers to the current directory in drive @C:@ and so is +-- conceptually a directory, but unlike other directories we cannot write +-- it with a trailing separator: @C:\\@ means the /absolute/ root of drive +-- @C:@, not the current directory. This is the one place in our path model +-- where the usual "directories carry a trailing separator" convention does +-- not apply. To explicitly write "current directory in drive @C:@" with a +-- trailing separator, use @C:.\\@ (which is equivalent to @C:@ under +-- 'eqPath'). +-- -- >>> isRooted = Path.isRooted . fromJust . Path.fromString -- -- Common to Windows and Posix: @@ -83,9 +81,9 @@ readArray = fromJust . fromArray . read -- >>> isRooted "/x" -- True -- >>> isRooted "." --- True +-- False -- >>> isRooted "./x" --- True +-- False -- -- Windows specific: -- @@ -267,6 +265,20 @@ joinDir -- >>> eq "c:x" "c:x" -- True -- +-- @C:@ (with no following separator) refers to the current directory in +-- drive @C:@, so a leading @.\\@ after the drive is redundant. Note that +-- @C:@ cannot carry a trailing separator (see the module note above), so +-- we compare against @C:.@ rather than @C:.\\@ here: +-- +-- >>> eq "c:bin" "c:./bin" +-- True +-- +-- >>> eq "c:" "c:." +-- True +-- +-- >>> eq "c:.." "c:./.." +-- True +-- -- Pass @'allowRelativeEquality' False@ to require absolute paths for -- equality: -- @@ -352,13 +364,13 @@ eqPath cfg (OS_PATH a) (OS_PATH b) = -- Just ("//x/y/",Just "z") -- splitRoot :: OS_PATH_TYPE -> Maybe (OS_PATH_TYPE, Maybe OS_PATH_TYPE) -splitRoot (OS_PATH x) = - let (a,b) = Common.splitRoot Common.OS_NAME x - in if Array.null a - then Nothing - else if Array.null b - then Just (OS_PATH a, Nothing) - else Just (OS_PATH a, Just (OS_PATH b)) +splitRoot (OS_PATH x) + | not (Common.isRooted Common.OS_NAME x) = Nothing + | otherwise = + let (a,b) = Common.splitRoot Common.OS_NAME x + in if Array.null b + then Just (OS_PATH a, Nothing) + else Just (OS_PATH a, Just (OS_PATH b)) -- | Split a path into components separated by the path separator. "." -- components in the path are ignored except when in the leading position. diff --git a/docs/Developer/FileSystem.DirIO.md b/docs/Developer/FileSystem.DirIO.md index 462cd91021..f7235f0e53 100644 --- a/docs/Developer/FileSystem.DirIO.md +++ b/docs/Developer/FileSystem.DirIO.md @@ -12,6 +12,12 @@ needed. FileIO module provides regular file create operation. +## File System as Tree vs Graph + +A file system is a tree when there are no hard links or symbolic links. But +in the presence of symlinks it could be a DAG or a graph, because directory +symlinks can create cycles. + ## Traversal vs Output control There are two dimensions to recursive directory reading APIs, traversal and diff --git a/docs/Developer/FileSystem.Path.Design.rst b/docs/Developer/FileSystem.Path.Design.rst new file mode 100644 index 0000000000..1cd4364788 --- /dev/null +++ b/docs/Developer/FileSystem.Path.Design.rst @@ -0,0 +1,552 @@ +WindowsPath and PosixPath Modules +--------------------------------- + +We should be able to manipulate windows paths on posix and posix paths on +windows as well. Therefore, we have WindowsPath and PosixPath types which +are supported on both platforms. However, the Path module aliases Path to +WindowsPath on Windows and PosixPath on Posix. + +PATH CLASSIFICATION +------------------- + +In general paths can be divided into the followign categories. + +RelPath + RelFree -- x, ./x -- both curdir and curdrive are unspecified + RelFixedDrive -- C:x -- On windows, drive specified, path relative to current dir on that drive + RelFixedDir -- \x -- On windows, absolute path on current drive (root-relative) + +RelFixedDrive and RelFixedDir are Windows only cases. +When we split a relative path into a head and stem we get ".", "c:", "/" as +heads when the path is RelFree, RelFixedDrive and RelFixedDir respectively. + +AbsPath -- fully specified path +On Posix, it is simple: "/x" +On Windows, there are multiple possibilities: + AbsDrive -- C:\x + AbsUNC -- \\server\share\x + AbsVerbatim -- \\?\... + AbsDevice -- \Device\... + +On Posix RelPath and AbsPath do not have any further classification. + +When appending paths, on Windows, do not insert a separator after a bare +drive (C:). For all practical purposes a bare "C:" can be treated as "C:./" +and then we do not need this special treatement wrt separators. + + C: x -> C:x + +PATH NAVIGATION SEMANTICS (follow) +---------------------------------- + +Most libraries (python, rust) including haskell filepath use the path +navigation semantics when composing paths ("" in filepath). + +The "follow" operation navigates first path followed by the second. In +other words, "follow p1 p2" interprets p2 in the context of p1. + +Operationally: + cd (follow p1 p2) == cd p1; cd p2 + +That is, p2 is resolved relative to the location denoted by p1. +The two paths denote a sequence of resolution operations, we resolve p1 and +then we resolve p2 with respect to p1. + +Note that this operation is total and never results in an error. + +Rules: + +1. If p2 is Relative: + + Absolute Relative -> Absolute + Relative Relative -> Relative + RelFixed Relative -> RelFixed + + (p2 is appended to p1) + +2. If p2 is Absolute: + + Any Absolute -> Absolute (p2 wins) + +3. If p2 is RelFixedDrive (C:y), if the drive is the same then combine +otherwise take the second path. If the drive is not specified then it is +considered to be different. + + C: C:y -> C:y -- C: equiv C:./ + C:x C:y -> C:x/y + C:/x C:y -> C:/x/y + D:x C:y -> C:y + + /x C:y -> C:y + x C:y -> C:y + + The "cd" semantics can be incorrect for the last two if we assume the + drive of the first path to be same as the second. + +4. If p2 is RelFixedDir (\y), discard LHS, if LHS has drive keep the drive: + + C: \y -> C:\y -- C: equiv C:./ + C:/ \y -> C:\y + C:/x \y -> C:\y + C:x \y -> C:\y + \x \y -> \y + x \y -> \y + + For the first 3 cases above, UNC behaves the same as a drive root: + + \\server\share\x \y -> \\server\share\y + +These are based on how python 'ntpath' module behaves. + +PATH CONSTRUCTION SEMANTICS (append) +------------------------------------ + +In Streamly path module the path "append" operation uses the path construction +semantics rather than path navigation semantics. + +"append" operation constructs paths structurally. The second argument must +be such that it can be interpreted relative to the first. While "follow" is +total, "append" is partial and can result in runtime errors. + +"append p r" extends path p suffixing the segments of r. + +1. Always valid if path being appended is fully relative: + + appendRel :: RelPath -> RelPath -> RelPath + appendAbs :: AbsPath -> RelPath -> AbsPath + +2. Never valid if r is AbsPath: + + / /x -> error -- can be allowed, but no exception + p AbsPath -> error + +3. Identity: + + "." is the empty relative path, it is identity of composition: + + appendAbs p "." == p + appendRel p "." == p + appendRel "." p == p + appendRelFixed p "." == p + +4. Associativity (via RelPath): + + append (append p a) b == append p (a <> b) + +Notes: + +- "." is not an anchor; it is the identity element of relative paths. +- On Windows AnchoredPath can only start with "\" or "C:", it cannot start +with "C:\" as that would make it an AbsPath. + +SUMMARY: append and follow +-------------------------- + +follow = resolution (contextual, may override) +append = construction (structural, no override) + +follow models filesystem navigation semantics +append models path construction semantics + +Appending Anchored Paths +------------------------ + +We provided a simple append algebra above, however, it gets complicated when +Windows anchored paths are considered. + +If second path is RelFixed, and has the same Anchor as the first path, then +strip the Anchor into a Maybe Drive and a Relative or "/" Anchored path and +then apply the same rules as above considering the "/" Anchored path as +absolute. + +1. If the second path is RelFixedDrive (C:y), if both the paths have drive +and it is the same then combine otherwise it is runtime error. + + C:\x C:y -> C:\x\y + C: C:y -> C:y -- C: equiv C:./ + C:x C:y -> C:x\y + + D:x C:y -> error + \x C:y -> error + x C:y -> error + +2. If the second path is RelFixedDir (\y). It is absolute within the +drive, therefore, similar to the absolute path rules, not allowed. + + C:\ \y -> error -- can be allowed, but this will be an exception + C:\x \y -> error + C: \y -> error -- C: is equiv C:. which is a relative path + C:x \y -> error + \x \y -> error + x \y -> error + + For the first 3 cases above, UNC behaves the same as a drive root: + +Typed paths +----------- + +Posix is simple but if we consider the Windows cases our algebra becomes +complicated. Cases that are allowed :: + + appendRel :: AnyPath -> RelFree -> AnyPath + appendFixedDrive :: RelFixedDrive -> RelFixedDrive -> RelFixedDrive + + where AnyPath => AbsPath, RelPath, RelFixedDrive, RelFixedDir + +Cases that are not allowed:: + + appendFixedDrive :: AnyExceptRelFixedDrive -> RelFixedDrive -> Error + appendFixedDir :: AnyPath -> RelFixedDir -> Error + +We see that the additional Windows anchored paths behave more like AbsPath when +composing. So they fall in the AbsPath bucket. + +Rooted Paths +------------ + +To keep the types and algebra simple we extend the concept of AbsPath to +RootedPath which is a path which may have some sort of root or anchor attached +to it. This includes absolute paths as well as the Windows anchored paths:: + + appendUnrooted :: Unrooted -> Unrooted -> Unrooted + appendRooted :: Rooted -> Unrooted -> Rooted + +Now, if we do that we disallow some cases that are possible for anchored but +not absolute paths. These cases are very few and a bit unusual, and we can do +without allowing them as well. The cases that we have are :: + + C:\x C:y -> C:\x\y + C: C:y -> C:y -- C: equiv C:./ + C:x C:y -> C:x\y + +To allow these cases we can provide a "combine" operation to combine Rooted +paths that have a common anchor and the path is relative to that anchor where +one more dimension of the path is free to change, and can be combined (e.g. +drive is fixed but current directory can change).:: + + combineRooted :: Rooted -> Rooted -> Maybe Rooted + +This operation can fail if the path does not have a free dimension that allows +it to combine or the root is not the same. + +How will the splitRoot operation behave when considering anchored paths:: + + splitRoot :: Rooted -> (Rooted, Unrooted) + -- Posix + splitRoot "/" => ("/", ".") + splitRoot "/x" => ("/", "x") + -- Windows + splitRoot "/" => ("/", ".") + splitRoot "/x" => ("/", "x") + splitRoot "C:" => ("C:", ".") + splitRoot "C:/" => ("C:/", ".") + splitRoot "//server/share" => ("//server/share", ".") + +To combine Rooted paths, split the root first and combine the Unrooted paths if +the root is common and not absolute drive or absolute dir in a drive. + +Examples of Rooted: "/", "/x", "C:", "C:x", "C:/", "//x/y". +Examples of Unrooted: "x", "x/y", ".", "./x", "..", "../x". + +------------------------------------------------------------------------------ +Naming Summary +------------------------------------------------------------------------------ + +Path classification is divided along two orthogonal dimensions: + +* Rootedness: ``Rooted`` / ``Unrooted`` +* Node type: ``File`` / ``Dir`` + +The modules corresponding to these dimensions are: + +* ``Streamly.FileSystem.Path.Rooted`` +* ``Streamly.FileSystem.Path.FileDir`` +* ``Streamly.FileSystem.Path.Typed`` + +The ``Typed`` module combines both dimensions, allowing types such as:: + + Rooted File + Rooted Dir + Unrooted File + Unrooted Dir + + +------------------------------------------------------------------------------ +Rooted / Unrooted +------------------------------------------------------------------------------ + +We considered several alternatives: + +* ``RootedPath`` / ``UnrootedPath`` +* ``AbsPath`` / ``RelPath`` +* ``Anchored`` / ``Branch``, ``Segment`` + +``AbsPath``/``RelPath`` are concise and familiar, but on Windows we also need +to classify constrained paths like:: + + C:x + \x + +These paths are rooted/constrained but not truly absolute because they still +depend on ambient process state such as the current directory or current drive. + +Treating such paths as ``AbsPath`` weakens the conventional meaning of +"absolute", and leaves no stronger term for paths that are fully anchored and +context-independent. + +Therefore we use ``Rooted``/``Unrooted``: + +``Rooted`` + + Paths with anchoring semantics. These may be fully absolute or partially + constrained. + +``Unrooted`` + + Pure appendable path branches with no anchoring semantics. + +This terminology: + +* preserves the conventional meaning of "absolute" +* thus allows using (isAbsolute Rooted) +* expresses Windows path semantics +* matches the append algebra +* keeps path append total and type-safe + + +------------------------------------------------------------------------------ +Why not use the ``Path`` suffix? +------------------------------------------------------------------------------ + +We considered names like:: + + RootedPath + UnrootedPath + +However, these names become verbose and repetitive when composing orthogonal +path dimensions:: + + RootedPath FilePath + UnrootedPath DirPath + +Since these types are already defined within the +``Streamly.FileSystem.Path`` hierarchy, the additional ``Path`` suffix does +not add much information. + +Using shorter modifier-style names keeps the type algebra concise and easy to +read:: + + Rooted File + Unrooted Dir + +This style also scales naturally when combining multiple orthogonal path +dimensions. + + +------------------------------------------------------------------------------ +Module Names +------------------------------------------------------------------------------ + +``Rooted`` + + Contains the rooted/unrooted distinction. + +``FileDir`` + + Contains the file/directory distinction. + +``Typed`` + + Fully typed paths, combines the rootedness and node-type dimensions. + +We also considered: + +* ``Rooted``: ``AbsRel``, ``Seg`` +* ``FileDir``: ``Node``, ``Kind`` + +but ``Rooted``/``FileDir``/``Typed`` were chosen because they use familiar +filesystem terminology and avoid introducing abstract or type-theoretic +vocabulary into the public API. + +Comparing Relative Paths +------------------------ + +We can compare two absolute rooted paths or path branches but we cannot +compare two relative rooted paths if the implicit meaning of the roots +may be different or contextual. If each component of two unrooted paths +are equal then the paths are considered to be equal. + +Implicit Rooted Paths (. as root) +--------------------------------- + +The following is a possible strict way of treating implicitly rooted relative +paths, but we are not doing this because this may become surprising and go +against the established intuition. + +The special path component "." implicitly refers to the current directory. On +Windows a path like @/Users/@ has the drive reference implicit. Such references +are contextual and may have different meanings at different times. + +@./bin@ may refer to a different location depending on what "." is referring +to. Thus ideally we should not allow @./bin@ to be appended to another path, +@bin@ can be appended though. Similarly, we cannot compare @./bin@ with @./bin@ +and say that they are equal because they may be referring to different +locations depending on in what context the paths were created. + +The same arguments apply to paths with implicit drive on Windows. + +Strictly speaking @.\/bin\/ls@ can be treated as an absolute path with +"." as an implicit dynamic root. On the other hand "bin/ls" is relative +path which represents steps from somewhere to somewhere else rather than +a particular location. We can also call @./bin@ as a "rooted path" as it +starts at a particular location rather than defining "steps" to go from +one place to another. If we want to append such paths we need to first +make them explicitly relative by dropping the implicit root. Or we can +use unsafeAppend to force it anyway or unsafeCast to convert absolute to +relative. + +If we compare these absolute/located paths having implicit roots then result +should be EqUnknown or maybe we can just return False?. @./bin@ and @./bin@ +should be treated as paths with different roots/drives but same relative path. +These paths can be compared as equal by enabling relative equality via flag. + +If we treat "." as a dynamic root then, ./bin and bin are not the same, +similarly ./.. and .. are not the same; these may be surprising unless +one is familiar with this model. + +Note that a trailing . or a . in the middle of a path is different as it +refers to a known name. + +Normalizing Paths With (..) +--------------------------- + +".." in a path refers to the parent directory relative to the current path. +For an absolute root directory ".." refers to the root itself because you +cannot go further up. + +When resolving ".." it always resolves to the parent of a directory as +stored in the directory entry. So if we landed in a directory via a symlink, +".." can take us back to a different directory and not to the symlink +itself. Thus @a\/b/..@ may not be the same as @a/@. Shells like bash keep +track of the old paths explicitly, so you may not see this behavior when +using a shell. + +For this reason we cannot process ".." in the path statically. However, if +the components of two paths are exactly the same then they will always +resolve to the same target. But two paths with different components could +also point to the same target. So if there are ".." in the path we cannot +definitively say if they are the same without resolving them. + +Normalization and comparison of paths +------------------------------------- + +Windows literal paths: + +Windows "Literal" Paths (\\?\): When you prefix a path with \\?\, you are +telling the Windows APIs to turn off all "normalization". + +Object Manager Paths: On Windows, paths like \??\C:\ or +\Device\HarddiskVolume1\ have very specific rules about separators. + +POSIX // +-------- + +On POSIX a path starting with exactly two slashes ("//x") is +implementation-defined. + +See https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html + +If a pathname begins with two successive characters, the first +component following the leading characters may be interpreted in an +implementation-defined manner, although more than two leading +characters shall be treated as a single character. + +This is rarely or historically used on Posix but may be of importance in +portable cygwin style paths where a UNC path \\server\share\file gets +converted to Posix style //server/share/file . + +If we want this behavior on Posix we can treat the path as a Windows path +and use Windows path operations on it. + +Trailing separators for directories +----------------------------------- + +@C:@ refers to the current directory in drive @C:@ and so is +conceptually a directory, but unlike other directories we cannot write +it with a trailing separator: @C:\\@ means the /absolute/ root of drive +@C:@, not the current directory. This is the one place in our path model +where the usual "directories carry a trailing separator" convention does +not apply. To explicitly write "current directory in drive @C:@" with a +trailing separator, use @C:.\\@ (which is equivalent to @C:@ under +'eqPath'). + +To avoid this problem we can normalize "C:" to "C:.\" . + +Design Considerations (old) +--------------------------- + +This section is from early thoughts and may be obsolete with respect to the +current design. + +* Should we store path as separate components or single string with + separators? + +* Should we validate the paths returned from the file system or trust + those and use directly without any validations? Need to see if that makes + any difference to path heavy benchmarks. If we want to use it directly + then we have to store it as a single string. + +* Parameterize the low level APIs with the separator so that we can + support arbitrary separators when parsing or reconstructing paths. + +* The low level API can support path handling in trees/DAGs/Graphs in general. + For example, in trees we cannot have multiple parents of a child whereas in + DAGs that is allowed, in graphs we can have cycles. We may also need ways to + detect cycles. + +* Do we need to support arbitrarily long paths i.e. streaming of path? We do + not need that for file system paths and file system paths are limited size + and operating system anyway requires them in strict buffers. In case of + graphs if we have cycles paths can be infinite, we could generate a stream of + path and the consumer could be traversing the graph according to the + generated stream. If we want to support streaming then we have to store paths + as a stream of chunks rather than a single string. + +* In general, paths need not be strings, e.g. they can be references to + locations in memory or they can be IP addresses of nodes. At an abstract + level, paths are just a stream of tokens that represent a certain traversal. + +* Relative paths are the most general representation. At a low level, + all paths are relative, absolute paths are relative to a specified root + whereas relative paths are relative to a dynamic root which is the + current directory. + +* Windows can have the root as different drive letters. So to represent paths + with a root in general we can also store the specific root along with the + path. In case of POSIX this will always be "/". In general, it could be a + host name or IP address or dependent on the protocol whose path we are + representing. + +* We can parameterize the low level path type with the type of path e.g. POSIX, + WINDOWS, HTTP etc. In general, programs may have to manipulate different + types of paths at the same time. High level path types can be instantiated + using the low level type therefore they can be much simpler as desired. + +References +---------- + +Windows paths: + +* https://docs.microsoft.com/en-us/windows/win32/intl/character-sets-used-in-file-names +* https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file +* https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/62e862f4-2a51-452e-8eeb-dc4ff5ee33cc + +Related Packages +---------------- + +* https://hackage.haskell.org/package/paths +* https://hackage.haskell.org/package/path +* https://hackage.haskell.org/package/hpath +* https://hackage.haskell.org/package/filepath +* https://hackage.haskell.org/package/file-io +* https://hackage.haskell.org/package/os-string diff --git a/docs/Developer/paths.rst b/docs/Developer/FileSystem.Path.Requirements.rst similarity index 76% rename from docs/Developer/paths.rst rename to docs/Developer/FileSystem.Path.Requirements.rst index 4900499b1c..7c942f2023 100644 --- a/docs/Developer/paths.rst +++ b/docs/Developer/FileSystem.Path.Requirements.rst @@ -194,70 +194,13 @@ Requirement Summary * support URI paths and other ways to represent paths where the separator could be different. -Design Considerations ---------------------- - -* Should we store path as separate components or single string with - separators? - -* Should we validate the paths returned from the file system or trust - those and use directly without any validations? Need to see if that makes - any difference to path heavy benchmarks. If we want to use it directly - then we have to store it as a single string. - -* Parameterize the low level APIs with the separator so that we can - support arbitrary separators when parsing or reconstructing paths. - -* The low level API can support path handling in trees/DAGs/Graphs in general. - For example, in trees we cannot have multiple parents of a child whereas in - DAGs that is allowed, in graphs we can have cycles. We may also need ways to - detect cycles. - -* Do we need to support arbitrarily long paths i.e. streaming of path? We do - not need that for file system paths and file system paths are limited size - and operating system anyway requires them in strict buffers. In case of - graphs if we have cycles paths can be infinite, we could generate a stream of - path and the consumer could be traversing the graph according to the - generated stream. If we want to support streaming then we have to store paths - as a stream of chunks rather than a single string. - -* In general, paths need not be strings, e.g. they can be references to - locations in memory or they can be IP addresses of nodes. At an abstract - level, paths are just a stream of tokens that represent a certain traversal. - -* Relative paths are the most general representation. At a low level, - all paths are relative, absolute paths are relative to a specified root - whereas relative paths are relative to a dynamic root which is the - current directory. - -* Windows can have the root as different drive letters. So to represent paths - with a root in general we can also store the specific root along with the - path. In case of POSIX this will always be "/". In general, it could be a - host name or IP address or dependent on the protocol whose path we are - representing. - -* We can parameterize the low level path type with the type of path e.g. POSIX, - WINDOWS, HTTP etc. In general, programs may have to manipulate different - types of paths at the same time. High level path types can be instantiated - using the low level type therefore they can be much simpler as desired. - References ---------- Some related links found by web search: +* https://en.wikipedia.org/wiki/Path_(computing) * https://gitlab.haskell.org/ghc/ghc/issues/5218 * https://nodejs.org/fr/docs/guides/working-with-different-filesystems/ * https://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux -* https://docs.microsoft.com/en-us/windows/win32/intl/character-sets-used-in-file-names * https://beets.io/blog/paths.html - -Related Packages ----------------- - -* https://hackage.haskell.org/package/paths -* https://hackage.haskell.org/package/path -* https://hackage.haskell.org/package/hpath -* https://hackage.haskell.org/package/filepath -* https://hackage.haskell.org/package/file-io -* https://hackage.haskell.org/package/os-string diff --git a/test/Streamly/Test/FileSystem/PosixPath.hs b/test/Streamly/Test/FileSystem/PosixPath.hs index 1c47bd129e..6e3c59c0cb 100644 --- a/test/Streamly/Test/FileSystem/PosixPath.hs +++ b/test/Streamly/Test/FileSystem/PosixPath.hs @@ -77,8 +77,8 @@ testRooted :: Spec testRooted = describe "isRooted/isUnrooted" $ do it "/ is rooted" $ Path.isRooted (p "/") `shouldBe` True it "/x is rooted" $ Path.isRooted (p "/x") `shouldBe` True - it ". is rooted" $ Path.isRooted (p ".") `shouldBe` True - it "./x is rooted" $ Path.isRooted (p "./x") `shouldBe` True + it ". is unrooted" $ Path.isUnrooted (p ".") `shouldBe` True + it "./x is unrooted" $ Path.isUnrooted (p "./x") `shouldBe` True it "x is unrooted" $ Path.isUnrooted (p "x") `shouldBe` True it "x/y is unrooted" $ Path.isUnrooted (p "x/y") `shouldBe` True it ".. is unrooted" $ Path.isUnrooted (p "..") `shouldBe` True @@ -118,12 +118,10 @@ testSplitRoot = describe "splitRoot" $ do fmap (fmap str . snd) r `shouldBe` Just (Just "home") it "relative has no root" $ isNothing (Path.splitRoot (p "home")) `shouldBe` True - it ". is root" $ - fmap (str . fst) (Path.splitRoot (p ".")) `shouldBe` Just "." - it "./home splits correctly" $ do - let r = Path.splitRoot (p "./home") - fmap (str . fst) r `shouldBe` Just "./" - fmap (fmap str . snd) r `shouldBe` Just (Just "home") + it ". has no root" $ + isNothing (Path.splitRoot (p ".")) `shouldBe` True + it "./home has no root" $ + isNothing (Path.splitRoot (p "./home")) `shouldBe` True testSplitFile :: Spec testSplitFile = describe "splitFile" $ do @@ -203,6 +201,12 @@ testEqPath = describe "eqPath" $ do Path.eqPath id (p "x") (p "X") `shouldBe` False it "relative paths equal by default" $ Path.eqPath id (p ".") (p ".") `shouldBe` True + it "./bin equals bin" $ + Path.eqPath id (p "./bin") (p "bin") `shouldBe` True + it "./.. equals .." $ + Path.eqPath id (p "./..") (p "..") `shouldBe` True + it "./../bin equals ../bin" $ + Path.eqPath id (p "./../bin") (p "../bin") `shouldBe` True it "allowRelativeEquality False makes relative paths unequal" $ Path.eqPath (Path.allowRelativeEquality False) (p ".") (p ".") `shouldBe` False @@ -394,11 +398,11 @@ testSplitRootExtended = describe "splitRoot (extended)" $ do toList (a, b) = (str a, fmap str b) cases = [ ("/", Just ("/", Nothing)) - , (".", Just (".", Nothing)) - , ("./", Just ("./", Nothing)) + , (".", Nothing) + , ("./", Nothing) , ("/home", Just ("/", Just "home")) , ("//", Just ("//", Nothing)) - , ("./home", Just ("./", Just "home")) + , ("./home", Nothing) , ("home", Nothing) ] mapM_ diff --git a/test/Streamly/Test/FileSystem/WindowsPath.hs b/test/Streamly/Test/FileSystem/WindowsPath.hs index 4099f40028..fa5cb46596 100644 --- a/test/Streamly/Test/FileSystem/WindowsPath.hs +++ b/test/Streamly/Test/FileSystem/WindowsPath.hs @@ -67,18 +67,6 @@ testFromString = describe "fromString" $ do -- Validation ------------------------------------------------------------------------------- -testValidatePathStrict :: Spec -testValidatePathStrict = describe "validatePath' (strict)" $ do - let isValid = isJust . Path.validatePath' . Path.encodeString - it "lone double separator invalid" $ - isValid "\\\\" `shouldBe` False - it "UNC server-only invalid" $ - isValid "\\\\server\\" `shouldBe` False - it "UNC server+share valid" $ - isValid "\\\\server\\x" `shouldBe` True - it "\\\\?\\UNC\\server alone invalid" $ - isValid "\\\\?\\UNC\\server" `shouldBe` False - ------------------------------------------------------------------------------- -- Separators ------------------------------------------------------------------------------- @@ -107,7 +95,7 @@ testRooted = describe "isRooted" $ do Path.isRooted (p "\\\\server\\share\\") `shouldBe` True it "verbatim root is rooted" $ Path.isRooted (p "\\\\?\\C:\\x") `shouldBe` True - it ". is rooted" $ Path.isRooted (p ".") `shouldBe` True + it ". is unrooted" $ Path.isUnrooted (p ".") `shouldBe` True it "x is unrooted" $ Path.isUnrooted (p "x") `shouldBe` True ------------------------------------------------------------------------------- @@ -178,6 +166,14 @@ testEqPathDefault = describe "eqPath default" $ do it "allowRelativeEquality False rejects drive-only relatives" $ do eqWith (Path.allowRelativeEquality False) "C:" "C:" `shouldBe` False eqWith (Path.allowRelativeEquality False) "C:x" "C:x" `shouldBe` False + it "C:bin equals C:./bin" $ + eq "C:bin" "C:./bin" `shouldBe` True + -- Note: C: cannot carry a trailing separator (C:\\ means the absolute + -- root of drive C, not the current dir). Compare against C:. instead. + it "C: equals C:." $ + eq "C:" "C:." `shouldBe` True + it "C:.. equals C:./.." $ + eq "C:.." "C:./.." `shouldBe` True it "redundant separators ignored" $ eq "x//y" "x/y" `shouldBe` True it "dot segments ignored" $ @@ -345,7 +341,9 @@ testValidatePath = describe "validatePath" $ do , ("\\\\\\", False) , ("\\\\x", False) , ("\\\\x\\", False) -- server only, no share + , ("\\\\server\\", False) , ("\\\\x\\y", True) + , ("\\\\server\\x", True) , ("//x/y", True) , ("\\\\prn\\y", False) , ("\\\\x\\\\", False) @@ -362,6 +360,7 @@ testValidatePath = describe "validatePath" $ do -- long UNC (\\?\UNC\) , ("\\\\?\\UnC\\x", True) -- UnC is treated as share name , ("\\\\?\\UNC\\x", False) + , ("\\\\?\\UNC\\server", False) , ("\\\\?\\UNC\\c:\\x", True) -- DOS device namespace , ("\\\\.\\x", True) @@ -383,8 +382,8 @@ testIsRootedWindows = describe "isRooted (windows-specific)" $ do cases = [ ("/", True) , ("/x", True) - , (".", True) - , ("./x", True) + , (".", False) + , ("./x", False) , ("c:", True) , ("c:x", True) , ("c:/", True) @@ -508,7 +507,6 @@ main = hspec $ do describe moduleName $ do testFromString testValidatePath - testValidatePathStrict testSeparators testRooted testIsRootedWindows