In ruby, optparse raises error when filename contains certain characters
I'm using optparse
in a ruby program (ruby 2.7.1p83
) under Linux. If any of the command-line arguments are filenames with "special" characters in them, the parse!
method fails with this error:
invalid byte sequence in UTF-8
This is the code which fails ...
parser = OptionParser.new {
|opts|
... etc. ...
}
parser.parse! # error occurs here
I know about the scrub
method and other ways to do encoding in ruby. However, the place where the error occurs is in a library routine (OptionParser#parse!
), and I have no control over how this library routine deals with strings.
I could pre-process the command-line arguments and replace the special characters in these arguments with an acceptable encoding, but then, in the case where the argument is a file name, I will be unable to open that file later in the program, because the filename I have accepted into the program will have been altered from the file's original name.
I could do something complicated like pre-traversing the arguments, building a hashmap where the key is the encoded argument and the value is the original argument, changing the ARGV values to the encoded values, parsing the encoded arguments using OptionParser
, and then going through the resulting arguments after OptionParser
completes and using the hashmap to in a procedure which replaces the encoded arguments with their original values ... and then continuing with the program.
But I'm hoping that there would be a much simpler way to solve this problem in ruby.
Thank you in advance for any ideas or suggestions.
UPDATE: Here is more detailed info ...
I wrote the following minimal program called rtest.rb
in order to test this:
#!/usr/bin/env run-ruby
# -*- ruby -*-
require 'optparse'
parser = OptionParser.new {
}
parser.parse!
Process.exit(0)
I ran it as follows, with the only files present in the current directory being rtest.rb
itself, and another file having this name: Äfoo
...
export LC_TYPE='en_us.UTF-8'
export LC_COLLATE='en_us.UTF-8'
./rtest.rb *
It generated the following error and stack trace ...
Traceback (most recent call last):
7: from /home/hippo/bin/rtest.rb:8:in `<main>'
6: from /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/optparse.rb:1691:in `parse!'
5: from /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/optparse.rb:1666:in `permute!'
4: from /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/optparse.rb:1569:in `order!'
3: from /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/optparse.rb:1575:in `parse_in_order'
2: from /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/optparse.rb:1575:in `catch'
1: from /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/optparse.rb:1579:in `block in parse_in_order'
/opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/optparse.rb:1579:in `===': invalid byte sequence in UTF-8 (ArgumentError)
Here is what appears in the pertinent section of the file /opt/rubies/ruby-2.7.1/lib/ruby/2.7.0/optparse.rb
. See line 1579
...
1572 def parse_in_order(argv = default_argv, setter = nil, &nonopt) # :nodoc:
1573 opt, arg, val, rest = nil
1574 nonopt ||= proc {|a| throw :terminate, a}
1575 argv.unshift(arg) if arg = catch(:terminate) {
1576 while arg = argv.shift
1577 case arg
1578 # long option
1579 when /\A--([^=]*)(?:=(.*))?/m
1580 opt, rest = $1, $2
In other words, the regex match on the argument is failing due to this encoding issue.
When I have time (not right away, unfortunately), I'll put some code into that module to do encoding of the arg
variable, to see if this might fix the problem.
FURTHER UPDATE: I am running under Ubuntu 20.0.4
, and the version of ruby that's offered is 2.7.0. I also managed to get 2.7.1 running on my ancient debian 8
box. This error occurs in both environments. I would have to install a newer version of ruby or compile it from source before I could try version 2.7.7 or version 3.x.
YET ANOTHER UPDATE: I had some unexpected spare time, and so I build ruby-3.3.0 from source and re-ran the test. I got the same error!
% /opt/local/rubies/ruby-3.3.0/bin/ruby ./rtest.rb *
/opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1640:in `===': invalid byte sequence in UTF-8 (ArgumentError)
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1640:in `block in parse_in_order'
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1636:in `catch'
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1636:in `parse_in_order'
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1630:in `order!'
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1739:in `permute!'
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1764:in `parse!'
from ./rtest.rb:8:in `<main>'
However, I now think the error occurs because the filename is encoded in an unusual manner. If I do echo *
in that directory, I see this, which is what I expect:
% echo *
Äfoo rtest.rb
However, if I do /bin/ls
in the same directory, I see this:
% /bin/ls *
''$'\304''foo' rtest.rb
And even the OS can't recognize the file with the name specified as follows ...
% /bin/cat 'Äfoo'
/bin/cat: Äfoo: No such file or directory
But if I use the longer, encoded file name, the OS has no trouble accessing the file ...
% /bin/cat ''$'\304''foo
File contents
File contents
The ls
command seems to know how to encode the Äfoo
filename into ''$'\304''foo
, but ruby doesn't seem to know how to do this.
Comments
Post a Comment