Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3876264
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T22:21:42+00:00 2026-05-19T22:21:42+00:00

I got very far in a script I am working on only to find

  • 0

I got very far in a script I am working on only to find out it has a problem reading UTF-8 characters.

I have a contact in Sweden that made a VM on his machine with some UTF-8 in it and when my script hit that VM it lost its mind, but it was able to read all of the other VMs that are in the “normal” charset.

Anyhow, maybe my code will make more sense.

#!/usr/bin/perl
use strict;
use warnings;
#use utf8;
use Net::OpenSSH;

# Create a hash for storing the options needed by Net::OpenSSH
my %ssh_options = (
    port => '22',
    user => 'root',
    password => 'password'
);

# Create a new Net::OpenSSH object
my $ssh = Net::OpenSSH->new('192.168.2.101', %ssh_options);

# Create an array and capture the ESX\ESXi output from the current server
my @getallvms = $ssh->capture('vim-cmd vmsvc/getallvms');
shift @getallvms;
# Process data gathered from server
foreach my $vm (@getallvms) {
    # Match ID, NAME
    $vm =~  m/^(?<id> \d+)\s+(?<name> .+?)\s+/xm;
    my $id = "$+{id}";
    my $name = "$+{name}";
    print "$id\n";
    print "$name\n";
    print "\n";
}

I have narrowed it down to my regular expression as the problem, because here the raw output from the server before regular expression is applied.

416
TEST Box åäö!"''*#

And this is what I get after I apply my regular expression

416
TEST

For some reason the regular expression is not matching, I just don’t know why. And the current regular expression in the example is the third attempt at getting it to work.

The FULL line that I am matching looks like this. The way my regular expression was done was because I only need the first two blocks of information, the expression you have wants to copy the entire line.

The code:

432    TEST Box åäö!"''*#   [Store] TEST Box +w6XDpMO2IQ-_''_+Iw/TEST Box +w6XDpMO2IQ _''_+Iw.vmx   slesGuest    vmx-04
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T22:21:43+00:00Added an answer on May 19, 2026 at 10:21 pm

    The subpattern

    (?<name> .+?)\s+
    

    in your regular expression means “match and remember one or more non-newline characters, but stop as soon as you find whitespace,” so $name contains TEST because the pattern stopped matching when it saw the space just before Box.

    The VI Toolkit wiki gives an example of the getallvms subcommand’s output:

    # vmware-vim-cmd -H 10.10.10.10 -U root -P password /vmsvc/getallvms
    Vmid    Name               File                 Guest OS       Version   Annotation
    64     bartPE    [store] BartPE/BartPE.vmx     winXPProGuest     vmx-04
    96     trustix   [store] Trustix/Trustix.vmx   otherLinuxGuest   vmx-04

    The case is slightly different from the example in your question, but it appears that we can look for [store] as a bumper for the match:

    /^(?<id> \d+) \s+ (?<name> .+?) \s+ \[store]/mix
    

    The non-greedy quantifier +? means match one or more of something, but the match wants to hand control to the rest of the pattern as quickly as possible. Remember that [ has a special meaning in regular expressions, but the pattern \[ matches a literal rather than introducing a character class.

    I think of this technique as bookending or tacking-and-stretching. If you want to extract a chunk of text that’s difficult to characterize, look for surrounding features that are easy to match—often as simple as ^ or $. Then use a stretchy pattern to grab everything in between, usually (.+) or (.+?). Read the “Quantifiers” section of the perlre documentation for an explanation of your many options.

    This fixes the immediate problem, and you can also add polish in a few areas.

    Do not use $1, $2, and friends unconditionally! Always test that the pattern matches before using capture variables. For example

    if (/(foo|bar|baz)/) {
      print "got $1\n";
    }
    else {
      print "no match\n";
    }
    

    An unprotected print $1 can produce surprising results that are tough to debug.

    Judicious use of Perl’s defaults can help emphasize the computation and lets the mechanism fade into the background. Dropping $vm in favor of $_ as the implicit loop variable and implicit match target makes for a nicer result.

    Your comments merely translate from Perl to English. The most helpful comments explain the why, not the what. Also keep in mind Rob Pike’s advice on commenting:

    If your code needs a comment to be understood, it would be better to rewrite it so it’s easier to understand.

    In the assignments from %+, the quotes don’t do anything useful. The values are already strings, so remove the quotes.

    my $id   = $+{id};
    my $name = $+{name};
    

    Below is a modified version of your code that captures everything after the number but before [store] into $name. The utf8 pragma declares that your source code—not, as with a common mistake, your input—contains UTF-8. The test below simulates with a canned echo the output from vim-cmd on the Swedish VM.

    As Tom suggested, I use the Encode module to decode the output that arrives through the SSH connection and encode it for benefit of the local host before printing it out.

    The perlunifaq documentation advises decoding external data into Perl’s internal format and then encoding any output just before it’s written. I assume that the value returned from $ssh->capture(...) uses UTF-8 encoding, that is, that the remote host is sending UTF-8. We see the expected result because I’m running a modern distribution of Linux and ssh-ing back to it, but in the wild, you may be dealing with some other encoding.

    You’re able to get away with skipping the calls to decode and encode because Perl’s internal format happens to match those of the hosts you’re using. In general, however, cutting corners can get you into trouble:

    • What if I don’t decode?
    • What if I don’t encode?

    Finally, the code!

    #! /usr/bin/env perl
    
    use strict;
    use utf8;
    use warnings;
    
    use Encode;
    use Net::OpenSSH;
    
    my %ssh_options = ();
    my $ssh = Net::OpenSSH->new('localhost', %ssh_options);
    
    # Create an array and capture the ESX\ESXi output from the current server
    #my @getallvms = $ssh->capture('vim-cmd vmsvc/getallvms');
    my @getallvms = $ssh->capture(<<EOEcho);
    echo -e 'JUNK\n416 TEST Box åäö!"'\\'\\''*#    [Store] TEST Box +w6XDpMO2IQ-_''_+Iw/TEST Box +w6XDpMO2IQ _''_+Iw.vmx   slesGuest    vmx-04'
    EOEcho
    shift @getallvms;
    
    for (@getallvms) {
      $_ = decode "utf8", $_, Encode::FB_CROAK;
    
      if (/^(?<id> \d+) \s+ (?<name> .+?) \s+ \[store]/mix) {
        my $id   = $+{id};
        my $name = $+{name};
        print encode("utf8", $id),   "\n",
              encode("utf8", $name), "\n",
              "\n";
      }
      else {
        print "no match\n";
      }
    }
    

    Output:

    416
    TEST Box åäö!"''*#
    
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

We've got a WinForms app written in C# that has a very custom GUI.
In another question I posted yesterday, I got very good advice on how a
I got a very similar error to the one below: How can I fix
I've got a very simple WPF UserControl that looks like this: namespace MyUserControl {
I've got a very small standalone vb.net app that gets run automatically. Every now
I've got a very simple WPF UserControl that looks like this: namespace WpfControlLibrary1 {
I've got a very simplistic game set up on the iPhone. Things move around,
Last time I asked about the reverse process , and got some very efficient
I got a core that looks very different from the ones I usually get
I've got an application that is very graphics intensive and built on DirectX and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.