-
Notifications
You must be signed in to change notification settings - Fork 283
Feat: Std.FileIO.ReadFile
and WriteFile
#6198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I added my own versions of FileIO.ReadFile and FileIO.WriteFile that I needed for my own application.
Fixes #6196 I updated the tests to ensure it was breaking and that this PR fixes the test. Basically, a test was testing if the type was a bitvector type, but this type was a synonym type with the new resolver so that needed to be updated. <small>By submitting this pull request, I confirm that my contribution is made under the terms of the [MIT license](https://github.com/dafny-lang/dafny/blob/master/LICENSE.txt).</small>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely in favor of the feature!
Source/DafnyStandardLibraries/src/Std/TargetSpecific/FileIO5-notarget-java-cs-js-go-py.dfy
Outdated
Show resolved
Hide resolved
Source/DafnyStandardLibraries/src/Std/TargetSpecific/FileIO5-notarget-java-cs-js-go-py.dfy
Show resolved
Hide resolved
if !Utf8EncodingForm.IsWellFormedCodeUnitSequence(bytes) { | ||
return Failure("Byte sequence of file '" + fileName + "' is not well formed UTF8"); | ||
} | ||
var x: seq<bv24> := Utf8EncodingForm.DecodeCodeUnitSequence(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can avoid all the extra error checking by using UnicodeStringsWithUnicodeChar.FromUTF8Checked
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That method only returns an Option
. This error checking makes it possible to precisely report at which index there is an error. I think I prefer that. Later, we could still refactoring this error checking code if it's used somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay fair. It would be great to improve Unicode
with more precise error messages as well at some point.
It does feel like you're reimplementing a fair bit of concepts from the Unicode module, and we have to be very careful to be consistent and sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For soundness, the good news is that all my code is only establishing the proof that I can call into the standard libraries, so we should be good for that.
Source/DafnyStandardLibraries/src/Std/TargetSpecific/FileIO5-notarget-java-cs-js-go-py.dfy
Outdated
Show resolved
Hide resolved
var x: seq<bv24> := Utf8EncodingForm.DecodeCodeUnitSequence(bytes); | ||
for i := 0 to |x| | ||
invariant forall k | 0 <= k < i :: x[k] < 1 << 16 { | ||
if !(x[i] < 1 << 16) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong, in --unicode-char:true
mode (which is the only mode the libraries currently support) valid characters go up to 0x10FFFF
, so you may reject valid data here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as I'd love to do that, when I run make update-standard-libraries
with the change you suggested,
invariant forall k | 0 <= k < i :: x[k] < 0x10FFFF {
if !(x[i] < 0x10FFFF) {
return Failure("index " + Strings.OfInt(i) + " is not a valid char");
}
I get.
FileIO5-notarget-java-cs-js-go-py.dfy(93,54): Error: bit-vector value to be converted might not fit in char
|
93 | var s := seq(|x|, i requires 0 <= i < |x| => x[i] as char);
| ^^
which is consistent with the verification condition we emit:
var toWidth = 16;
if (toWidth < fromWidth) {
// Check "expr < (1 << toWidth)" in type "fromType" (note that "1 << toWidth" is indeed a value in "fromType")
PutSourceIntoLocal();
var toBound = BaseTypes.BigNum.FromBigInt(BigInteger.One << toWidth); // 1 << toWidth
var bound = BplBvLiteralExpr(tok, toBound, fromType.AsBitVectorType);
var boundsCheck = FunctionCall(expr.Origin, "lt_bv" + fromWidth, Bpl.Type.Bool, o, bound);
var dafnyBound = new BinaryExpr(expr.Origin, BinaryExpr.Opcode.LeftShift, Expression.CreateIntLiteral(expr.Origin, 1), Expression.CreateIntLiteral(expr.Origin, toWidth));
var dafnyBoundsCheck = new BinaryExpr(expr.Origin, BinaryExpr.Opcode.Lt, expr, dafnyBound);
builder.Add(Assert(tok, boundsCheck, new ConversionFit("bit-vector value", toType, dafnyBoundsCheck, errorMsgPrefix), builder.Context));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's because the correct check is to make sure the character isn't a surrogate as well. See here:
public static Expression MakeCharBoundsCheckUnicode(Expression expr) { |
I think you could simplify and just test if x[i] is char
?
if !Utf8EncodingForm.IsWellFormedCodeUnitSequence(bytes) { | ||
return Failure("Byte sequence of file '" + fileName + "' is not well formed UTF8"); | ||
} | ||
var x: seq<bv24> := Utf8EncodingForm.DecodeCodeUnitSequence(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay fair. It would be great to improve Unicode
with more precise error messages as well at some point.
It does feel like you're reimplementing a fair bit of concepts from the Unicode module, and we have to be very careful to be consistent and sound.
…g/dafny into feat-fileio-read-write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for pushing the errors improvements into Unicode, love it! Just some cleanup before merging
*/ | ||
method ReadUTF8FromFile(fileName: string) returns (r: Result<string, string>) { | ||
var bytes :- ReadBytesFromFile(fileName); | ||
return UnicodeStringsWithUnicodeChar.FromUTF8Checked(seq(|bytes|, i requires 0 <= i < |bytes| => bytes[i] as uint8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SO much better, thank you! The rest of the declarations below aside from WriteUTF8ToFile
are dead code and can be deleted now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd mention the improvements to error messages in Unicode as well, partially because it's a great improvement but also because it's technically breaking. Perhaps mention how to unbreak affected code as well with Result.ToOption()
expect res.value == expectedStr; | ||
} | ||
|
||
// Failure path: attempting to read from a blank file path should never work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A second failure case where the bytes are not UTF8 would be very valuable too, especially since AFAICT we don't prove that the index is correct. Could also put it in UnicodeExamples.dfy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnicodeExamples already has a test for ill cases, so I just added 2 expect lines to demonstrate what to expect as an error message.
@@ -127,43 +128,45 @@ abstract module Std.Unicode.UnicodeEncodingForm { | |||
|
|||
/** | |||
* Returns the unique partition of the given code unit sequence into minimal well-formed code unit subsequences, | |||
* or None if no such partition exists. | |||
* or Failure(CodeUnitSeq) if no such partition exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific and say it returns the suffix that couldn't be partitioned? I had to read a fair bit of other code to figure that out.
docs/dev/news/fileioreadwrite.feat
Outdated
With `--standard-libraries` you can now read an UTF-8 text files from the disk using `Std.FileIO.ReadFile(path: string): Result<string, string>`. | ||
To write some content to the disk, use `Std.FileIO.WriteFile(path: string, content: string): Outcome<string>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With `--standard-libraries` you can now read an UTF-8 text files from the disk using `Std.FileIO.ReadFile(path: string): Result<string, string>`. | |
To write some content to the disk, use `Std.FileIO.WriteFile(path: string, content: string): Outcome<string>`. | |
With `--standard-libraries` you can now read an UTF-8 text files from the disk using `Std.FileIO.ReadUTF8FromFile(path: string): Result<string, string>`. | |
To write some content to the disk, use `Std.FileIO.WriteUTF8ToFile(path: string, content: string): Outcome<string>`. |
This PR adds an opinionated way to write Dafny strings to files and read them back using UTF-8 encoding. With this PR, it becomes finally possible to create command-line utilities that read a file and process it in an easy way. ### What was changed? - Added `FileIO.ReadUTF8FromFile` and `FileIO.WriteUTF8ToFile` so that it's easier to read/write strings to text files within Dafny. - `FromUTF8Checked`, `FromUTF16Checked` and `DecodeCodeUnitSequenceChecked` all now return a `Result` instead of an `Option` so that the error message is clearer. ### Example For example, here is a tool that displays the content of a file ``` import opened Std.FileIO method Main(args: seq<string>) { expect |args| >= 2; var result :- expect ReadUTF8FromFile(args[1]); print result.value; } ``` Usage: ``` dafny run --standard-libraries cat.dfy cat.dfy ``` It prints itself. ### How has this been tested? I added tests to read and to write strings similar to the existing tests to read and write UFT-8 characters. <small>By submitting this pull request, I confirm that my contribution is made under the terms of the [MIT license](https://github.com/dafny-lang/dafny/blob/master/LICENSE.txt).</small>
This PR adds an opinionated way to write Dafny strings to files and read them back using UTF-8 encoding.
With this PR, it becomes finally possible to create command-line utilities that read a file and process it in an easy way.
What was changed?
FileIO.ReadUTF8FromFile
andFileIO.WriteUTF8ToFile
so that it's easier to read/write strings to text files within Dafny.FromUTF8Checked
,FromUTF16Checked
andDecodeCodeUnitSequenceChecked
all now return aResult
instead of anOption
so that the error message is clearer.Example
For example, here is a tool that displays the content of a file
Usage:
It prints itself.
How has this been tested?
I added tests to read and to write strings similar to the existing tests to read and write UFT-8 characters.
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.