Skip to content

feat(server): support handling spaces in filenames #107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 26, 2023

Conversation

omerbustun
Copy link
Contributor

Description

Added an option to enable encoding for returned URL. It can be enabled in the config by setting url_encode_filenames = true.

Motivation and Context

My terminal wasn't hyperlinking the returned URL when the file name had spaces in it. I wanted to be able to click on the URL in the terminal rather than copying and pasting it in a browser.

How Has This Been Tested?

  • Test cases
  • Manual testing

Changelog Entry

Added

  • Add a function that encodes the file name and consequently the returned URL.
  • Add a config option to turn the feature on and off.

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (no code change)
  • Refactor (refactoring production code)
  • Other

Checklist:

  • My code follows the code style of this project.
  • I have updated the documentation accordingly.
  • I have formatted the code with rustfmt.
  • I checked the lints with clippy.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@omerbustun omerbustun requested a review from orhun as a code owner August 2, 2023 21:08
@orhun
Copy link
Owner

orhun commented Aug 5, 2023

Hello, can you give an use case example for this?

@codecov-commenter
Copy link

codecov-commenter commented Aug 5, 2023

Codecov Report

Patch coverage: 84.61% and project coverage change: -0.48% ⚠️

Comparison is base (31c6cbf) 70.15% compared to head (198810d) 69.68%.
Report is 1 commits behind head on master.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #107      +/-   ##
==========================================
- Coverage   70.15%   69.68%   -0.48%     
==========================================
  Files          11       11              
  Lines         563      574      +11     
==========================================
+ Hits          395      400       +5     
- Misses        168      174       +6     
Flag Coverage Δ
unit-tests 69.68% <84.61%> (-0.48%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/paste.rs 84.74% <66.66%> (-0.60%) ⬇️
src/config.rs 72.41% <80.00%> (-10.20%) ⬇️
src/server.rs 81.86% <100.00%> (-0.23%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@orhun
Copy link
Owner

orhun commented Aug 7, 2023

Can you also push the updated Cargo.lock?

@orhun
Copy link
Owner

orhun commented Aug 11, 2023

ping @omerbustun

@omerbustun
Copy link
Contributor Author

Hello @orhun,

Thank you for reviewing my PR.

For the use case example:

Consider a scenario where I upload a file named Project Plan 2023.docx using Rustypaste. Without URL encoding, the returned URL might look something like:

https://paste.site.com/Project Plan 2023.docx

Many terminals recognize URLs and make them clickable for user convenience. However, if the URL contains spaces or other special characters, the terminal might not recognize it as a single clickable link. Instead, it might split it into multiple segments or not make it clickable at all.

This not only affects the convenience of directly clicking the link, but also the action of copying it. When the URL is recognized as one continuous piece, users can generally right-click to copy it from the terminal. But when it has spaces, it doesn't get recognized as a URL, forcing users to manually select the entire link to copy, which can be cumbersome.

By encoding the filename in the URL, the returned URL would look like:

https://paste.site.com/Project%20Plan%202023.docx

This format is more universally recognized by terminals and web browsers, making it easier for users to directly interact with the link from their terminal.

I believe this enhancement can improve the user experience, especially for those who frequently interact with the terminal.

I've also pushed the updated Cargo.lock as per your request.

Please let me know if there are any other changes or clarifications needed.

Best regards,
Ömer

@tessus
Copy link
Collaborator

tessus commented Aug 11, 2023

@omerbustun have you tested what happens with unicode characters? Are they encoded as well?

e.g.

https://paste.site.com/감사합니다
https://paste.site.com/🤯.txt

@omerbustun
Copy link
Contributor Author

Hello @tessus,

Yes, the urlencoding crate I utilized in the feature handles the encoding of unicode characters seamlessly. It assumes UTF-8 encoding and percent-encodes every byte except alphanumerics and some special characters (-, _, . and ~).

As per the examples you provided, they become the following:

https://paste.site.com/%EA%B0%90%EC%82%AC%ED%95%A9%EB%8B%88%EB%8B%A4
https://paste.site.com/%F0%9F%A4%AF.txt

I hope this clarifies the behavior of the encoding with respect to unicode characters.

Best regards,
Ömer

@tessus
Copy link
Collaborator

tessus commented Aug 11, 2023

Thanks for the info. It's just that IMO unicode characters do not need to be encoded. Most terminals are unicode compatible and reading an encoded unicode string is impossible. e.g I don't know that %EA%B0%90%EC%82%AC%ED%95%A9%EB%8B%88%EB%8B%A4 is 감사합니다.
This might be ok for people who write mostly in the ascii alphabet, but everything else is unreadable.

I think the only reason you want to do this encoding is to be able to click on a link with spaces. Woudn't it be better to only encode the spaces (when random URLs off)?
To be fair, clients all handle spaces and unicode. You can curl unicode URLs (and URLs with spaces) or paste them in the browser.

Or maybe rustypaste should just remove spaces from the file name or replace them with an underscore.

@omerbustun
Copy link
Contributor Author

You've got a point about modern clients handling spaces and unicode in URLs without any issues. However, the primary motivation behind my proposal was rooted more in user convenience than technical compatibility. In my experience with Windows Terminal, URLs with spaces aren't recognized as clickable links, as shown in the screenshot.

resim

Your suggestion to only encode spaces or replace them with underscores is a viable alternative that would address the main concern while preserving the readability of filenames with unicode characters. This way, non-ASCII users won't be faced with an unreadable string, yet the core functionality of having clickable links in terminals would still be achieved.

It's also worth noting that since this feature can be toggled on or off via the config, users who prefer the original behavior or find the encoding unnecessary can simply keep it disabled.

With that said, I'm open to refining this feature based on your feedback. Would you prefer the encoding to be limited to spaces only, or to replace spaces with underscores? I'm flexible and willing to adjust the implementation accordingly.

Best,
Ömer

@tessus
Copy link
Collaborator

tessus commented Aug 11, 2023

I was just thinking out loud. Orhun will have to decide how to handle this.

Since I almost never use file names with spaces, I don't really have this problem. I would just hate to see such an encoding as the default and only behavior. If it were behind a config option (as you have coded it), I wouldn't really care.

But in reality, how many people deactivate the random feature just to keep the file name as is? Don't forget, this also has the disadvantage that files with the same name are overwritten.

Personally I think that replacing spaces with underscores is the best option, but that's just my opinion.

@oflifurkan
Copy link
Contributor

oflifurkan commented Aug 11, 2023

I was just thinking out loud. Orhun will have to decide how to handle this.

Since I almost never use file names with spaces, I don't really have this problem. I would just hate to see such an encoding as the default and only behavior. If it were behind a config option (as you have coded it), I wouldn't really care.

But in reality, how many people deactivate the random feature just to keep the file name as is? Don't forget, this also has the disadvantage that files with the same name are overwritten.

Personally I think that replacing spaces with underscores is the best option, but that's just my opinion.

As a regular user of this project for my friends and family, I've noticed a recurring issue when sharing links with spaces. When I copy and send these links through apps like Discord and WhatsApp, they aren't recognized as hyperlinks due to the spaces. This pull request effectively addresses this common problem. Offering the option to encode the URL in this PR is a thoughtful solution that will significantly improve the sharing experience on such platforms.

@omerbustun
Copy link
Contributor Author

But in reality, how many people deactivate the random feature just to keep the file name as is? Don't forget, this also has the disadvantage that files with the same name are overwritten.

In my setup, I use the suffix_mode for the random URL. This mode allows me to retain a part of the original filename while adding a unique suffix. So I can differentiate the files by their URL and also ensure that files with the same name don't get overwritten.

@tessus
Copy link
Collaborator

tessus commented Aug 11, 2023

As a regular user of this project for my friends and family, I've noticed a recurring issue when sharing links with spaces. When I copy and send these links through apps like Discord and WhatsApp, they aren't recognized as hyperlinks due to the spaces.

Which can also be solved by replacing spaces with underscores.

Don't get me wrong, I am not against this PR or making links with spaces clickable. This is not what I am saying. I am just asking myself if adding another dependency for encoding URLs is necessary when the same can be achieved by just replacing spaces. That's all.

@orhun
Copy link
Owner

orhun commented Aug 14, 2023

It is an edge case but I think it doesn't hurt to have this feature.

But before merge, @omerbustun can you complete the following checklist?

  • Resolve conflicts with master
  • Add unit tests for this configuration option
  • Add fixture tests (see fixtures)

@tessus
Copy link
Collaborator

tessus commented Aug 15, 2023

Sorry for being a pain in the ass, but wouldn't it be much easier and less complex in terms of future maintenance overhead just to add one line to paste.rs?

        let file_name = path
            .file_name()
            .map(|v| v.to_string_lossy())
            .unwrap_or_default()
            .to_string()
            .replace(" ", "_");                // <---- this line here on line 169 

@orhun
Copy link
Owner

orhun commented Aug 15, 2023

@tessus actually yeah, it seems like the only point of using URL encoding is to replace the whitespaces with underscores. I personally never hit a case like this so I'm okay with both ways. Of course, avoiding a dependency would be nice considering that urlencoding is now unmaintained.

- Introduced an option to enable encoding for filenames in the returned URL.
- Modified the encoding approach to specifically target whitespaces, replacing them with `%20`.
- Included unit tests to validate this encoding approach.
- Added fixture tests for broader functional verification.
- Removed the `urlencoding` dependency from `Cargo.toml`.
@tessus
Copy link
Collaborator

tessus commented Aug 16, 2023

@omerbustun may I ask why the code is more than the one line I suggested? The filename is not completely retained anyway, because a random suffix is added. Or do you really want to keep the spaces in the filename? rustypaste is not a storage backend, accessed via an NFS or Samba share. I am not against it, I am just curious.

For me translating spaces into underscores would not even have to be guarded by a config option. I think rustypaste is for quickly making data available to others. There is no need to retain a literal space in the URL. Or is there?

@omerbustun
Copy link
Contributor Author

Hi @orhun and @tessus,

I've made some modifications to the approach based on the feedback:

  • Encoding whitespaces: Instead of using the urlencoding library to encode all characters, I've now simplified the approach to only replace whitespaces with %20. This directly addresses the primary concern of filenames with spaces, and keeps the implementation straightforward.
  • Tests: I've included both unit and fixture tests as you requested, @orhun.
  • Dependency on urlencoding: While I've removed the direct use of the urlencoding library in my approach, it's worth noting that this library is already a transitive dependency in the project, brought in by other dependencies.
  • Regarding the suggested change: @tessus, while replacing spaces with underscores is a simpler approach, the advantage of the configuration option I proposed is flexibility. Users who find the feature unnecessary can keep it turned off, while others who need it can enable it. Your approach, although simpler, would always replace spaces, making it a global change. In the end it all boils down to preference, I prefer having and providing options.

@tessus
Copy link
Collaborator

tessus commented Aug 17, 2023

No worries, I just meant it's not a bad idea to replace spaces with underscores globally. File names are not retained anyway, so why would I care, whether there is a space or an underscore?

When I added the random suffix mode, I only wanted to make sure that people (including myself) already have an idea of what could be in the file. Which is absolutely impossible to ascertain by reading a file name called hopping-elephant.log. ;-)

For me a file name is easier to read with underscores than with %20. That's just my opinion, which is of course subjective.

As mwntioned before, I am not against this PR or the config parameter.

@orhun
Copy link
Owner

orhun commented Aug 18, 2023

Thanks for the changes @omerbustun!

For me a file name is easier to read with underscores than with %20.

I would say yes, I would vote against using %20 instead of _ which made me think about the following idea.


Right now we have this:

[server]
url_encode_filenames = true

Instead, I propose:

[paste]
replace_filename = [ { from = " ", to = "%20" } ]

So that you can configure it to replace whitespaces with underscores as follows:

[paste]
replace_filename = [ { from = " ", to = "_" } ]

Or even multiple characters/strings:

[paste]
replace_filename = [ { from = " ", to = "---" }, { from = "foo", to = "bar" } ]

What do you think? @omerbustun @tessus

@tessus
Copy link
Collaborator

tessus commented Aug 18, 2023

I think the issue is a bit more complicated.

The current implementation does not change the filename in the upload directory. It only returns the filename encoded with %20 instead of spaces.

My idea of just replacing spaces with underscores would take place in paste.rs and thus the file would not be stored with spaces in the upload dir, but with underscores. Which is also fine, but the concept is different.

The idea of a mapping function is intriguing, but introduces a certain complexity to the code:

  • when will the mapping be applied
    • before/after adding the suffix
    • before/after adding the extension
  • are we storing the file in the upload dir with %20?
  • certain characters must be filtered out
    • from: . should not be replaced
    • to: ., /, \, : have certain meanings in filesystems and must be avoided in a filename
  • there might be an attack vector in such a replacement, unless a thorough filter and checking of user input is done

While I really like the idea, it's a huge maintenance burden and adds unnecessary complexity to the code.

IMO, there are 3 options:

  • leave it in the current state of this PR
  • do the space to underscore replacement (w/o a config option)
  • rename the config option to paste.replace_spaces and allow only 2 options:
    • percent (which replaces the spaces with %20)
    • underscore (which replaces the spaces with _)

@omerbustun
Copy link
Contributor Author

Hi @orhun and @tessus,

Thank you both for the active discussion. Here's my two cents.

  • The PR encodes the URL, not the filename. The file in the upload directory and when downloaded retains its spaces. URLs need to be hyperlink-friendly; filenames don't have to be. Hardcoding %20 into filenames isn't the objective.
  • Replacing spaces with underscores in filenames might be practical for URLs but moves away from this PR's intent. If users disable random_url, they might want original filenames intact.
  • replace_filename is an interesting proposal, but it does introduce complexity and potential edge cases as pointed out by @tessus.

Given the above, this PR's approach provides a balance between usability and preserving original file names so I'm leaning towards leaving it as is in the options stated by @tessus. Appreciate the feedback, and let's determine the best direction.

- Fixed errors in the fixture configuration to ensure tests run as expected.
- Reformatted line endings for consistency across the codebase.
- Made necessary adjustments to adhere to Clippy's recommendations.
@tessus
Copy link
Collaborator

tessus commented Aug 20, 2023

Replacing spaces with underscores in filenames might be practical for URLs but moves away from this PR's intent

I do understand what you are saying, but I am trying to explain that when you merge this PR and later on it is decided that replacing spaces with underscores is a good idea anyway, the code that was merged for this PR becomes cruft and useless, since there won't be any spaces anymore in filenames.

This PR showed how useless and annoying spaces are, so why keep them. There is no reason to. As mentioned before, rustypaste is not a storage backend which goal it is to preserve filenames. Filenames were not the goal of this project, neither is the goal to preserve spaces in filenames. This is not why rustypaste was created.

Anyhoo, I don't eant to dismiss the notion that spaces are important, so the 3rd option I mentioned allows for both situations. It didn't mean that one has to replace the spaces with %20 in filenames, but rather use the codepath currently in this PR.

I would like to add a PR in the future that replaces spaces with underscores. However, if this is done automatically this code becomes obsolete. To not do that another config option is necessary. But why not use the same config option to handle both cases?

Maybe the name of the option I suggested was off. Maybe it should be

paste.filenames and the 2 options: encode (your codepath), replace_spaces (my codepath).

Or maybe you can come with a better naming. The current one can only be used for encoding, since the word enccode is part of the config option. To allow for both it should be more generic.

Another idea (although not very good, but it's just an example):
paste.algorithm: url_encode, replace_spaces

P.S.: The reason why I think one config option makes sense is that the one excludes the other.

omerbustun and others added 2 commits August 21, 2023 16:01
- Implemented a configuration choice for space handling in filenames: encode, replace with underscores or none.
- Added corresponding unit and fixture tests.
@omerbustun omerbustun closed this Aug 21, 2023
@omerbustun omerbustun deleted the add_url_encoding branch August 21, 2023 13:07
@omerbustun omerbustun restored the add_url_encoding branch August 21, 2023 13:08
@omerbustun omerbustun reopened this Aug 21, 2023
@omerbustun
Copy link
Contributor Author

@tessus,

I've updated the PR to provide three options for handling spaces in filenames: encode, replace, and none. I believe this addresses both of our concerns. Additionally, I've added unit and fixture tests corresponding to the latest changes.

I look forward to seeing it merged.

Best,
Ömer

@tessus
Copy link
Collaborator

tessus commented Aug 21, 2023

Thanks for the change. I have 2 comments:

We only need 2 options. none is irrelevant, because in that case the option is just not set. (commented out)

The possible values for that option should be handled via an enum:

enum SpaceHandling {
    Encode,
    Replace
}

impl SpaceHandling {
    fn as_str(&self) -> &'static str {
        match self {
            SpaceHandling::Encode => "encode",
            SpaceHandling::Replace => "replace"
        }
    }
}

I think this is the Rust way, but orhun would know better. I could be totally wrong. I am new to Rust, but I believe I have seen this type of contruct in such cases.

Fixed misplaced line as per @tessus's review feedback, preventing duplicate handling of spaces in filenames.
@tessus
Copy link
Collaborator

tessus commented Aug 22, 2023

I believe the as_str method is not even necessary, if you use:

#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum SpaceHandling {
    Encode,
    Replace,
}

There is a similar code block in random.rs. But let's wait for orhun's input.

Btw, I am also more than happy to make the "enum" change, if you allow me to push to your branch.

@orhun
Copy link
Owner

orhun commented Aug 24, 2023

Yeah having an enum for this would be nice I think.

- Introduced `SpaceHandling` enum to manage filename spaces.
- Updated server.rs and paste.rs to utilize the new method.
- Added unit tests for the new filename handling method.
- Adjusted fixture tests to reflect these changes.
@omerbustun
Copy link
Contributor Author

I've updated the code to use an enum and made changes to server.rs and paste.rs based on your suggestions. I moved the relevant parts to util.rs, though I'm not sure if this fits the project's structure. I added new unit tests and adjusted the existing fixture tests. I'm looking forward to your feedback.

Also, since I'm quite new to Rust and this is my first PR, I really appreciate your patience and guidance so far :).

@orhun orhun requested review from orhun and tessus August 25, 2023 11:28
Copy link
Collaborator

@tessus tessus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I also support orhun's idea to add the function to the enum. Then this should be good to go.

- Relocated filename processing logic to reside within the `SpaceHandling` enum, enhancing encapsulation.
- Updated calls in `server.rs` and `paste.rs` to use the new method structure.
- Adjusted unit tests to align with the refactored function location and call pattern.
Copy link
Collaborator

@tessus tessus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the 2 inline comments, it looks great. My suggestions only remove 2 unnecessary one line assignments. It's not critical, but I think it's best to avoid those. I made the same mistake myself before. ;-)

Thanks for this PR!

- Incorporated the inline suggestions made by @tessus, removing the unnecessary one-line assignments.
- Ensured the code adheres to Rust guidelines by running clippy and fmt.
@omerbustun
Copy link
Contributor Author

Thank you for your feedback, @tessus! I've incorporated the suggested changes and made sure everything is aligned by running both clippy and fmt on the final commit. Let me know if there's anything else!

Copy link
Collaborator

@tessus tessus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't see anything else. Unless I missed something this is good to go.

@orhun orhun self-requested a review August 26, 2023 12:11
@orhun orhun changed the title feat(server): add url encoding feat(server): support handling spaces in filenames Aug 26, 2023
Copy link
Owner

@orhun orhun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the implementation/discussion @omerbustun @tessus! 💖

@orhun orhun merged commit 95379c9 into orhun:master Aug 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants