Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6105869
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T14:01:06+00:00 2026-05-23T14:01:06+00:00

I’m writing a lambda calculus interpreter for fun and practice. I got iostreams to

  • 0

I’m writing a lambda calculus interpreter for fun and practice. I got iostreams to properly tokenize identifiers by adding a ctype facet which defines punctuation as whitespace:

struct token_ctype : ctype<char> {
 mask t[ table_size ];
 token_ctype()
 : ctype<char>( t ) {
  for ( size_t tx = 0; tx < table_size; ++ tx ) {
   t[tx] = isalnum( tx )? alnum : space;
  }
 }
};

(classic_table() would probably be cleaner but that doesn’t work on OS X!)

And then swap the facet in when I hit an identifier:

locale token_loc( in.getloc(), new token_ctype );
…
locale const &oldloc = in.imbue( token_loc );
in.unget() >> token;
in.imbue( oldloc );

There seems to be surprisingly little lambda calculus code on the Web. Most of what I’ve found so far is full of unicode λ characters. So I thought to try adding Unicode support.

But ctype<wchar_t> works completely differently from ctype<char>. There is no master table; there are four methods do_is x2, do_scan_is, and do_scan_not. So I did this:

struct token_ctype : ctype< wchar_t > {
 typedef ctype<wchar_t> base;

 bool do_is( mask m, char_type c ) const {
  return base::do_is(m,c)
  || (m&space) && ( base::do_is(punct,c) || c == L'λ' );
 }

 const char_type* do_is
  (const char_type* lo, const char_type* hi, mask* vec) const {
  base::do_is(lo,hi,vec);
  for ( mask *vp = vec; lo != hi; ++ vp, ++ lo ) {
   if ( *vp & punct || *lo == L'λ' ) *vp |= space;
  }
  return hi;
 }

 const char_type *do_scan_is
  (mask m, const char_type* lo, const char_type* hi) const {
  if ( m & space ) m |= punct;
  hi = do_scan_is(m,lo,hi);
  if ( m & space ) hi = find( lo, hi, L'λ' );
  return hi;
 }

 const char_type *do_scan_not
  (mask m, const char_type* lo, const char_type* hi) const {
  if ( m & space ) {
   m |= punct;
   while ( * ( lo = base::do_scan_not(m,lo,hi) ) == L'λ' && lo != hi )
    ++ lo;
   return lo;
  }
  return base::do_scan_not(m,lo,hi);
 }
};

(Apologies for the flat formatting; the preview converted the tabs differently.)

The code is WAY less elegant. I does better express the notion that only punctuation is additional whitespace, but that would’ve been fine in the original had I had classic_table.

Is there a simpler way to do this? Do I really need all those overloads? (Testing showed do_scan_not is extraneous here, but I’m thinking more broadly.) Am I abusing facets in the first place? Is the above even correct? Would it be better style to implement less logic?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T14:01:06+00:00Added an answer on May 23, 2026 at 2:01 pm

    (It’s been a year with no substantive answer, and I’ve learned a lot about iostreams in the meantime…)

    The custom facet exists exclusively to serve the string extraction operator in >> token. That operator is defined in terms of use_facet< ctype< wchar_t > >( in.getloc() ).is( ios::space, c ) “for the next available input character c.” (§21.3.7.9) ctype::is is simply a stub for ctype::do_is, so it would seem that do_is is sufficient.

    Nevertheless, recent versions of the GCC standard library do implement operator>> in terms of scan_is. The catch is that do_scan_is is then implemented as a series of calls to do_is, virtual dispatch and all. The header file describes do_scan_is as a hook for user optimization.

    So, it would seem that the as-if rule shelters an implementation that only provides the first override.

    Note that the second override, which retrieves mask values, is an odd one out. It could be implemented in terms of the first, by inefficiently building the mask bit by bit. In GCC it is implemented in terms of system calls, inefficently building the mask bit by bit with 15 calls per character. This seems to sacrifice both performance and compatibility. Fortunately it seems nobody uses it.


    Anyway, this is all well and good, but simply writing a tokenizer using streambuf_iterator<wchar_t> is easier, far more extensible, and simplifies exception handling.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

link Im having trouble converting the html entites into html characters, (&# 8217;) i
I've got a string that has curly quotes in it. I'd like to replace
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
i got an object with contents of html markup in it, for example: string
I am writing an app with both english and french support. The app requests
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I have just tried to save a simple *.rtf file with some websites and
I want to count how many characters a certain string has in PHP, but
I would like to count the length of a string with PHP. The string

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.