User talk:Dcoetzee/Wikicode/Specification

This pseudocode is no longer a proposed standard. See User:Dcoetzee/Wikicode.

This issue has generated much discussion, and here seems to be the best place to continue that discussion, but for background reading, you should also see Talk:Pseudocode, Talk:Algorithms on Wikipedia, and User talk:Dcoetzee/Wikicode.

(I've added the above for newcomers, like me, not existing protagonists. Mark Hurd 10:16, 10 Oct 2004 (UTC))

Discussion originally placed on project page follows

Some thoughts

(Style?) Instead of declaring type of variables after the name, perhaps they should be done before: integer left, integer right, integer modulus for example.
- This is a rather difficult issue; placing types before the name works fine for single-word types, but can become rapidly more complicated as types do, until it's no longer clear where the type ends and the variable name begins; the italics might help there though.Derrick Coetzee 23:31, 5 May 2004 (UTC)[reply]
  - Bold types? Dysprosia
    - Can we count on italics or bold being always available in the reader's browser? If not, some sort of flag will be needed. See comment comment below. ww 22:35, 6 May 2004 (UTC)[reply]

Comments should always have some kind of pre-delimiting thingy. Italicised comments solely on one line will be confusing (above, for instance, "Do something silly"). The italics don't matter so much.
- My previous complaint about this was bogus. I don't yet have a good solution for multiline comments, and I've mandated parentheses around comments now as well as italics. Derrick Coetzee 03:12, 6 May 2004 (UTC)[reply]
  - They look great now :) Dysprosia 03:44, 6 May 2004 (UTC)[reply]
    - Can we count on italics or bold being always available in the reader's browser? I suggest some sort of delimiter (at both ends of the comment) as mandatory, with italics being strongly encouraged. It's always easier when reading to ignore something that's present (possibly logically superfluous, and inelegant too) than to supply an equivalent -- in this case delimiting marks. Easier to read... ww 22:35, 6 May 2004 (UTC)[reply]

I don't like this "var x" thing. It should not be the sole way of designating variables in a pseudocode form such as this. If one is demonstrating weak typing, sure, but otherwise, one should probably use a declaration form such as integer number, real your_bank_balance, boolean is_true, etc
- Ah, you're misunderstanding here. A type specification is allowed, as the examples with : int demonstrate, they're just not required. You may be looking for them before and not after (see second point). Derrick Coetzee 03:12, 6 May 2004 (UTC)[reply]
  - No, I mean that in tandem with my other thought that types should come before the variable (so var x : int => integer x. var x can be used for weak typing examples. Dysprosia 03:44, 6 May 2004 (UTC)[reply]
    - I'm going to suggest that, as clarity is more important than easy of lexical analysis, or parser construction that parameters be very clearly marked as that's a source of difficulty in reading. I suggest that some like IN: foo, bar OUT: foo, bar for params that are being passed by reference, and USES: beeble, brox for params passed by value. For similar reasons, it should probably be madatory that params and variables and constants should be explicitly typed. An old and ugly idea, for which apologies. More keystrokes, but better clarity for the reader. ww 22:35, 6 May 2004 (UTC)[reply]

Maps should be a little more general, not just mapping numbers to anything, but also mapping strings and other types to anything as well.
- On this point, I didn't mean to imply they were so limited; I'll expand the example set.Derrick Coetzee 23:31, 5 May 2004 (UTC)[reply]

Indenting shouldn't be so rigidly described however. The way wikicode is structured won't make things too unreadable.
- Done; is this more like what you had in mind? Derrick Coetzee 03:12, 6 May 2004 (UTC)[reply]
  - A standard is reasonable, but mandatory doesn't quite cope with the odd cases which arise. Not sure how to do so, beyond exhortations for clarity nice examples here, and some edit prowling by others. Fixed numbers of spaces are better than tabs, of course, in respect to readability, but present troubles in some fonts browsers may be using. Can we always assume availabilty of a monospaced font? ww 22:35, 6 May 2004 (UTC)[reply]

This is all just some random thoughts, so feel free to disregard what I say as you see fit ;) Dysprosia 23:09, 5 May 2004 (UTC)[reply]

I'll have to take another look at this later (just taking a quick re-examination atm) Dysprosia 03:41, 6 May 2004 (UTC)[reply]

Thanks a lot for your careful examination and comments, will make some related changes soon. Derrick Coetzee 23:31, 5 May 2004 (UTC)[reply]

I don't think brackets on a function call should be optional. Why have two ways to do the same thing? CGS 23:44, 5 May 2004 (UTC).[reply]

The idea is to reduce clutter in some cases; it's done this way in languages like ML, for example. It may in fact not be justified though. Change it if you like. Derrick Coetzee 00:06, 6 May 2004 (UTC)[reply]

Update: Done; I think you're right that the benefit of consistency outweighs any chance of added clarity here. Derrick Coetzee 03:13, 6 May 2004 (UTC)[reply]

I can see how something like this would be useful to illustrate simple algorithms. But even there, you can't necessarily compare the efficiency of two algorithms without knowing a little bit more about how data structures are meant to be implemented. For instance, "lists" in real programming languages are sometimes linked lists (as in Lisp) and sometimes vectors (as in Python). With linked lists, inserting an element in the middle is cheap, but taking the list's length is expensive; with vectors, it's just the other way around.

This is a pretty general problem; for the same reason it's not clear how efficient the map type is. My philosophy here is that the operations I describe are just an interface to an abstract data structure, and that in translating to an executable code (or performing analysis) you would choose an appropriate real data structure based on the set of operations being used. On the other hand this might be potentially confusing to the reader; I'm not sure which way to go on this. I certainly don't want a proliferation of collections for every concrete collection data structure. What do you think? Derrick Coetzee 15:43, 6 May 2004 (UTC)[reply]

Well, if we assume that everything is by-reference, you don't need linked lists built-in, you can do them with records just like in every high-school Pascal class. :P Let the built-in list type be like an extensible array or vector: the natural way to loop through it is to iterate over it with an index variable, not to cdr down it, but you can still push things on the ends like a deque. --FOo 20:52, 6 May 2004 (UTC)[reply]

I'd caution here that too much abstaction will inevitably cause comprehension problems. Recall that the Reader assumed in many cases will not be familiar with level of abstraction clarifying tricks and if so is more likely to be confused than helped. I have no solution to this, of course, but have repeatedly run across it when attempting to address said type of reader/listener. ww 22:35, 6 May 2004 (UTC)[reply]

To Fubar, I'd like to emphasise that the purpose of these types is just to simplify examples that deal with collections but don't fundamentally depend on how they work; linked lists would still be built by hand for example in the linked list article.

Here's my compromise I've come up with: each abstract type has associated synonyms that suggest particular implementations, and so assign effective costs to each operation that can be used in analysis. See the list section for my first example. Derrick Coetzee 22:53, 6 May 2004 (UTC)[reply]

I also don't see anything here about pass-by-value vs. pass-by-reference. This matters, too: can functions alter their parameters in the enclosing scope of the call?

Certainly required in some cases. C's management of it is abysmal, in my view. ww 22:35, 6 May 2004 (UTC)[reply]

At least one more operation on maps is needed: keys(map), to extract a set or list of the keys. And how is the map unassigned value distinct from the record null value? Is one meant to smell like a Lisp NIL and the other like an SQL NULL? :)

Yeah, I just noticed the lack of keys operation myself, fixed that. As for unassigned versus null, the reason null is not used for both is so that a key can map to null without being unassigned. Derrick Coetzee 15:22, 6 May 2004 (UTC)[reply]

That sounds a little confusing, it leads to the question, "If I can store a record null in a map, can I store a map unassigned in a record?" Thus, each different subscriptable type ends up with its own way of saying "nobody home"?

The way various real languages do it may be illustrative: In Perl, you can ask if a map key is defined. In Python, you can use the in operator on any collection type -- for maps ("dictionaries") it asks if a key is defined in the map. --FOo 20:52, 6 May 2004 (UTC)[reply]

The keywords tagged union and cases are related to one another, but don't look related from the names. Perhaps type union and type case? --FOo 14:18, 6 May 2004 (UTC)[reply]

Thanks a lot for the feedback, will address your other points a bit later. Derrick Coetzee 15:22, 6 May 2004 (UTC)[reply]

Block delimiting

Instead of using indenting, it may be more clearer to use begin and end blocking words to make the function body more clearer, if the block is larger than one line.

- This doesn't seem unreasonable; however, I don't really want to have end keywords for conditionals and loops, because of the useless space they occupy (without much added clarity). I'm following the style of Introduction to Algorithms in that. What do you think? Derrick Coetzee 23:31, 5 May 2004 (UTC)[reply]
  - I know and agree that they're ugly, but I think they are more clear, in that the code block is written more explicitly. Dysprosia 03:41, 6 May 2004 (UTC)[reply]
    - I would suggest indent as primary choice (easier to read...) and some block delimiter as alternative where indentation is confusing. Latter should be explicitly termed last resort in the spec, even though equivalent. ww 22:35, 6 May 2004 (UTC)[reply]

For block delimiting, I would prefer to use parentheses (like in ML) to indicate a list of statements. Shown below are examples without delimiters, with begin and end tags, and with parentheses.

 sum := 0
 while n > 0
     sum := sum + n
     n := n - 1

 sum := 0
 while n > 0
 begin
     sum := sum + n
     n := n - 1
 end

 sum := 0
 while n > 0
 (
     sum := sum + n
     n := n - 1
 )

Out of these examples, I like the parentheses best.... jaredwf 16:18, 8 May 2004 (UTC)[reply]

But with the indentation the parentheses are redundant. CGS 09:29, 9 May 2004 (UTC).[reply]

It is redundant, but I think it is clearer to have the extra space around the block to separate it from the surrounding code. Either way is fine with me, but I still prefer the well-spaced code. jaredwf 09:50, 9 May 2004 (UTC)[reply]
Indentation is easily broken, especially whe you are modifying the code. There is one TCL language that tries to do without blocking markers. It is pain in the ass to fix cut'n'paste errors in it. Mikkalai 21:17, 13 May 2004 (UTC)[reply]

May I remind you still another way that saves one line of code

 sum := 0
 while n > 0
     sum := sum + n
     n := n - 1
 endwhile

or a variation:

 sum := 0
 while n > 0
     sum := sum + n
     n := n - 1
 elihw

Moreover, a disadvantage of parentheses is that they visually "blur" the boundaries of a block, you know, kind of gray-scale halftones between black and white. Mikkalai 21:24, 13 May 2004 (UTC)[reply]

These examples look like Fortran 90/95/2003, where one would write

  do while (n > 0)
     sum = sum + n
     n   = n - 1
  end do

Fortran could be used as a model.

Whatever approach, IMO block termination keyword is a must. Suppose you had

 while n > 0
     sum := sum + n
     read_next(n)
 report(n)

By a single keystroke, I convert it into

 while n > 0
     sum := sum + n
     read_next(n)
     report(n)

It requires a man who knows an algorithm in depth to catch the bug. Mikkalai 23:31, 13 May 2004 (UTC)[reply]

Just to add my opinion to the blocks debate: I think the maintenance concern is, frankly, moot, because we will not be writing large amounts of code using this. Keep that in mind. We will certainly be changing the code, but it should generally be expected to fit comfortably on a screen, if not less. I generally favour indention alone for this reason.

As for Mikkalai's example, it's true that a relatively small edit could effect a big difference in behaviour, but the resulting problem is highly visible, and so should be quickly fixed. I've seen sneaky vandalism on Wikipedia that was much more subtle than this, and still got quickly fixed.

As my final point in favour of indention alone, I note that, when block delimeters are used, if a line of code were improperly indented, we would nevertheless consider this a potential readability problem and so something that we would have to fix anyway. The added flexibility is not only unnecessary but undesirable.

However, I also realise that many existing programmers are used to some sort of block delimeter, and so this helps them read code. Whether this outweighs the brevity and reduced clutter gained by omitting them, I am unsure. In any case, we would certainly use some form of punctuation, and not cluttersome begin/end keywords.

Derrick Coetzee 02:30, 14 May 2004 (UTC)[reply]

I like the endwhile example by Mikkalai since it saves some space, but still "blocks" off the block.... If we use block delimiters, "improperly indented" code may have a readability problem, but the code will still be correct and it will be obvious that there is improper indention. Like the example from above, if we use endwhile we have

 while n > 0
     sum := sum + n
     read_next(n)
 endwhile
     report(n)

and if we don't we get

 while n > 0
     sum := sum + n
     read_next(n)
     report(n)

It is extremely obvious from the first example that the indention is wrong, but for the second it is not as clear. The block delimiter only adds one more line and I think it aids clarity.

-- jaredwf 04:35, 14 May 2004 (UTC)[reply]

Instead of endwhile why not use Donald Knuth's proposal? Like so:

 while n > 0
     sum := sum + n
     read_next(n)
 repeat
 report(n)

The keyword repeat is nice because when you read to it, you realize you are reading the last statement of a looping block. This is better than "end" which could be confused with an if (also, it makes it sound like the loop will end, as if it were a break). I too agree with the others here who think that Python's experiment in formatting is a failed one at best, and that some delimiter (even if it's small like fi and repeat) is better than nothing. (Mind you, Python is an enjoyable programming language to program in, and it's very useful. I just think they made a bad decision with this one issue.) -- MShonle 19:15, 21 Dec 2004 (UTC)

Iteration

The control statements look good, though, perhaps for the foreach type code, one could actually write for each blah in...
- Done, good idea. Derrick Coetzee 03:12, 6 May 2004 (UTC)[reply]
  - Id like to see the spec as follows: for each blah in some_container, since in some sophisticated algorithms one may want to write, e.g., for each blah in heap or in some custom-tailored container. (cf: STL iterators) Mikkalai 23:46, 13 May 2004 (UTC)[reply]

This proposal is worthwhile and I think some such is overdue. Lots of work, but .... I see a potential problem however. In the interest of doing a really good job, there will surely be here the dreaded committee effect. Accretion of feature after beloved feature until the result is the camel designed by the horse design committee. This being WP, where all is hostage to all, I can't think of a solution.

Wirthian feature parsimony is fine in many respects, but not ideally suited for instant transparency. C character parsimony is certainly not. COBOLic verbosity is self-defeating. And so on. There may not be an ideal examplar amongst real languages.

The proposal as it exists now is admirable. Best possible, certainly not -- there can be no way to identify it even if we had one. Perhaps a reasonable way would be to have an open period, followed by a vote, followed by a discussion changes until another vote. And repeat.

Sort of like Linus does with the kernel. But, who will bell the cat by taking the Linus role? The vote procedure at least has the virtue of precedent here on WP.

Comment? ww 22:54, 6 May 2004 (UTC)[reply]

You know what? Perhaps each of us should come up with a solution/wikicode proposal, and let the community vote on one proposal, and then chip away/discuss/improve that? Dysprosia 23:02, 6 May 2004 (UTC)[reply]

I agree that the way I've set it up right now is bound to introduce bias, not the least of which being the tendency to not change reasonable-looking things I wrote into the original proposal. I think the most important thing however is getting something through the process and into use - hopefully something good, but consistency is my main concern. Any method that would garner sufficient interest among computer science contributors to take a standard through to official status is good enough for me.

I was hoping that by encouraging others to edit the proposal itself we could avoid the bias of it being written by a single entity, but for some reason people seem unwilling to make even small changes, instead suggesting them in the discussion. We have a page history, don't we?

My remaining concern is a bit elitist, but I still feel compelled to voice it: when designing a pseudocode, people often favour features and even syntactical conventions of languages they know. I'm sure your average stock programmer would design a pseudocode disturbingly similar to C and Java. This does have the advantage of familiarity, which helps readability and writability, but may be worse in the long run, as popular languages change.

In any case, let's choose some method and stick with it. One possible method is to create a small number of proposals, individually improve each of them through the wiki process, and then merge them into a finished product; if we can get enough people behind the process to carry this out, I think it would be good.

Derrick Coetzee 17:58, 7 May 2004 (UTC)[reply]

The pseudocode would have to be an imperative language, right? Or, is a functional language a possibility? jaredwf 16:44, 8 May 2004 (UTC)[reply]

In my proposal I included a mechanism for closures, which may come in handy. There are three arguments I would make for an imperative pseudocode, in order of importance:

Traditionally, most pseudocodes in common use, such as in reference works, are imperative.
More programmers are familiar with imperative languages than functional languages.
It's easy to embed most functional code in an imperative language, by writing a function with only one return statement. (One may wish to "manually" eliminate tail recursion, or else state in the text that it is being eliminated.)

Derrick Coetzee 01:00, 12 May 2004 (UTC)[reply]

Objection to the whole idea of wikicode

Don't start from scratch. Start from Python. It already looks like pseudocode, and supports both imperative and functional paradigms. I can't think of any reason the pseudocode presented here is superior to Python for any of the features Python supports.

Certainly, Python has the one big advantage that people can test the correctness of the pseudocode before they publish it. You can't do that with a fictional pseudocode, and therefore all such code would be perpetually untrustworthy. -Doradus 02:53, 9 Jun 2004 (UTC)

Wikipedia is not a Python library, nor should it attempt to model things based on Python either. Every language has it's own strengths, and it's not NPOV to use Python as a model or base.

Ideally, implementations of algorithms actually already in a programming language should go to Wikisource, and Wikipedia should have algorithms in pseudocode. Dysprosia 08:32, 9 Jun 2004 (UTC)

Let me be clear that we should not sacrifice one iota of clarity or neutrality to make our code examples valid Python code.

Having said that, what is the advantage of gratuitous variations from Python such as omitting the colons? It doesn't make the code any clearer, and it may take code that is otherwise valid Python (and therefore testable and unambiguous) and make it impossible to run. I can't see what purpose that serves. -Doradus 20:10, 9 Jun 2004 (UTC)

My opinion on this: Python is fairly readable, but contains as many eccentricities and unnecessary syntax as any other real programming language. I think a pseudocode should take advantage of the ability of humans to use reasonable context-sensitive interpretation. However, even more than this I object to your statement that code that cannot be compiled and run is not trustworthy. It is common practice in many environments to statically prove the correctness of programs, and this provides much better guarantees of correctness than any number of test runs. Derrick Coetzee 16:09, 10 Jun 2004 (UTC)

You're preaching to the choir regarding the need for reasoning about program correctness. However, your objection only makes sense if test runs preclude static correctness proofs, which they do not. The ability to run a piece of code and verify that it works can only increase the odds that it will be error-free. The relative quantity of bugs each approach can uncover depends critically on the the rigor of the proof and the skill of the prover, so I question your assertion that static proofs must necessarily provide a better guarantee of correctness.

Remember Knuth's famous quote "Beware of bugs in the above code; I have only proved it correct, not tried it."

Anyway, I admit "perpetually untrustworthy" was a tad strong.

Oh, and my opinion is that Python contains far, far fewer eccentricities than most other languages, but I grant that it is only an opinion. -Doradus 14:41, 11 Jun 2004 (UTC)

I said it before and I'll say again... While NPOV is important, I can't see the logic in inventing a brand new language that looks almost identical to Python but isn't actually executable Python code. It's like inventing a whole new mathematical language because if I write $a/b\;$ then I might alienate people who prefer ${\frac {a}{b}}$ or a÷b. Or rolling our own pronunciation guide system because choosing SAMPA would be biased against those who prefer the International Phonetic Alphabet.

If you open up old CS texts, you may find examples written in ALGOL or COBOL, languages that would certainly not be one's first choice today. You may also find others written in a myriad different and independently-invented pseudocode notations, with subtle differences and subtle ambiguities. Given the choice, I'd take the ALGOL or COBOL, because if you find that you don't understand a piece of made-up pseudocode, where do you turn for further clarification?

Now, we don't have to go with Python per se; it's just that Python is so similar to the pseudocode we are using that it presents itself as the obvious choce. If we can find a better existing language, let's start with that instead! And if we find flaws in Python that make it unsuitable for use as pseudocode, let's use a "purified" variant of Python and call that wikicode. But let's not gratuitously inflict yet another arbitrary pseudocode on the world.

"Wikipedia is not a Python library" -- perhaps, but it stands to become something even less useful: a library of un-executable, ambiguous, probably-correct-but-never-actually-tested code. --Doradus 04:13, Sep 22, 2004 (UTC)

Objection to Wikicode AND Python

Wikicode does not fit the definition of pseudocode, which in computing is defined as 'using language similar to plain English to explain the purpose of each line of a program' (or something along those lines). In other words, it would be something like

Display "Welcome to my program" on the screen
If user presses 'Spacebar' then run main program
End If
If user presses 'X' then close program
End If

Making up a pretend programming language does NOT fit the definition of pseudocode and does NOT perform any useful purpose on wikipedia. Python would meet with the same problems as I have outlined above. If an article requires pseudocode, it requires PSEUDOcode, not programming code! --Cynical 15:51, 24 Sep 2004 (UTC)

You're completely right. I think we've lost sight of the "pseudo" in pseudocode. That said, there are places where code (instead of pseudocode) is appropriate. In those places, I don't see that we have anything to gain from using Wikicode as opposed to executable code of some form. Wile E. Heresiarch 18:12, 24 Sep 2004 (UTC)

I don't think we should be forced to code in any particular language or style for our code/pseudocode. --Improv 04:34, 30 Sep 2004 (UTC)

Slightly More Universality?

Hello, all. :) I think this is a wonderful project, and the approach is well organized. I have a few thoughts:

At the top of this page, a stated goal is "Universality: code should be clear to programmers familiar with nearly any common language, and not too similar to any one." I am a little concerned with code that is only clear to "programmers." I think that some effort should be made to create a syntax that is slightly more intuitive to beginning programmers and to non-programmers who are simply interested in algorithms.

I agree with this statement; I've tried to use some basic familiar math notation and to stick to English and fairly ubiquitous terminology, but there's always room for improvement. I address some of your specific points below. Derrick Coetzee 17:59, 25 Aug 2004 (UTC)

For example, instead of Wirth's := operator, how about an "arrow" showing the direction of assignment/data movement:

n <- 5

The use of the word "function" seems problematic to me, both because of the many different names that are used, (function, subroutine, procedure, handler, etc.), and because of possible confusion that may arise with mathematically-inclined, yet non-programming-inclined readers. The article at subroutine uses an interesting and nonpartisan word: "subprogram".

subprogram sort( list, comparisonPredicate )
 (indented body of function)

I've had a few more thoughts... but I'll wait to see what the reaction to these two are. :) AdmN 06:02, 25 Aug 2004 (UTC)

The arrow is certainly used in some languages, but it is confusable with the use of < and - as arithmetic operators. n<-5 looks like a Boolean expression to me: "is n less than negative five?" Since the colon does not have the ambiguity that the left angle bracket does, := seems clearer. Heck, if we wanted to we could use the AppleScript notations set n to 5 or copy 5 to n ... :)

I already use some non-ASCII symbols in the pseudocode. Arrow has an HTML entity, and it could be used like this, addressing your objections at the cost of a bit of editability:

  x ← 2

This may not be very pretty in some fixed-width fonts though. What do you guys think? Derrick Coetzee 17:59, 25 Aug 2004 (UTC)

As for the word "subprogram", I do not think it is used in any language that readers are likely to recognize. As I recall, Wikipedia had a single editor a while ago who went through and put this word on several pages where commoner words were used before. I don't see a benefit to it. The Pascalish distinction between "functions" that return one value and "procedures" that return none seems likewise arbitrary -- after all, there are languages in which a "function" can return anywhere from zero to 255 values (Common Lisp). The term "function" for a routine that is called with arguments and possibly returns a value seems well-established and safe today. --FOo 13:05, 25 Aug 2004 (UTC)

I agree with this; for any laymen who have doubts, the mandatory link to the pseudocode description page should set them straight. Derrick Coetzee 17:59, 25 Aug 2004 (UTC)

OK. :) Mandatory links to clear and concise documentation should be enough.

With regard to AppleScript: after 8 years of being forced to use this silly language, I wouldn't wish it on my worst enemy... ;-)

Hmm... though AppleScript's "repeat" might be a better self-documenting term than "for", as in

repeat with i from 1 to 10

...or something less wordy. :) AdmN 18:21, 25 Aug 2004 (UTC)

How about, "for each i between 1 and 10"? I imagine that's how one would say it in English - but could be confused with the for each loop. Derrick Coetzee 19:40, 25 Aug 2004 (UTC)

No, not "between". Then you have to say "inclusive" or else people will wonder if 1 is "between" 1 and 10. I think "for each i from 1 to 10" is shorter, clearer, and describes the sequencing better than "between." --Doradus 04:17, Sep 22, 2004 (UTC)

Perhaps this discussion should be archived and a new one started? That way we don't have discussions on old discussions. wrp103 (Bill Pringle) - Talk 14:09, 22 Sep 2004 (UTC)

I like "for each", it is much clearer than "for" by itself, as in "for ( init; bool; incr )". The word "between" might almost suggest exclusivity, as in "a proper fraction is a value between 0 and 1". AdmN 19:54, 25 Aug 2004 (UTC)

"Sentence" operations

It occurred to me, thinking of AdmN's concerns, that it might be preferable to avoid putting so many operators in a single C-style function call notation, and instead to use sentences or phrases where appropriate, inserting variable names in reserved slots, in a manner more similar to Smalltalk and English text. Example:

Template:    insert value at end of list
Use example: insert x at end of primesList

I would also allow (but not mandate) the definition of functions that can be called in this manner:

 function kill numBunnies : int bunnies using weapon
     (bunny-slaughtering code)

What are your thoughts on this? Derrick Coetzee 18:20, 25 Aug 2004 (UTC)

With regard to my own concerns, I think that anyone who is interested in algorithms should be willing to learn an algebraic syntax, so long as it isn't overly complicated. The problem with English-like syntaxes, (take it from a guy who knows), is that they can lead to the same ambiguity of thought in the reader's mind as actual English can. However, now allow me to contradict myself, and point out that, when used ONLY for writing pseudocode, AppleScript is actually quite nice... it's only in the REAL world that AS sucks. ;-)

-- Notes:
--     AppleScript 'lists' and strings are 1-based.
--     '<' and '>' can be written as "less than" and "greater than"
--     '<=' and '>=' can also be expressed with 1 character in MacRoman and Unicode,
--         as well as by the longer statements "less than or equal to", etc.
--     Possession can be written in two forms:
--         item i of a
--         a's item i
--
on insertion_sort( a )

    set n to length of a

    -- Initially, the first item is considered 'sorted'
    -- i divides a into a sorted region, x < i, and an unsorted one, x >= i
    --
    repeat with i from 2 to n

        set v to item i of a -- Select the item at the beginning of the as yet unsorted section

        set j to i -- Work backwards through the array, finding where v should go

        repeat while item ( j - 1 ) of a > v -- If this element is greater than v, move it up one

            set item j of a to item ( j - 1 ) of a
            set j to j - 1

            if ( j <= 1 ) then exit repeat

        end repeat

        set item j of a to v -- Stopped when a[ j - 1 ] <= v, so put at position j

    end repeat
end insertion_sort

AppleScript also has the advantage of being verifiable... if you have a Mac and don't mind using the world's slowest language. ;-) I'm not really advocating this, I like what User:Dcoetzee has done here, (especially since AS isn't going to help anyone learn bit-wise operations, static-typing techniques, etc., etc., etc.). AdmN 19:22, 25 Aug 2004 (UTC)

Cheat Sheet

The main page should eventually have a "cheat sheet", a concise listing of the conventions for quick reference by those who just need an aid to memory. I could start something like this later tonight, if there are no objections. AdmN 20:05, 25 Aug 2004 (UTC)

That sounds great. I'd encourage you to add this content to User:Dcoetzee/Wikicode (intended to be the "User's Guide" for readers and editors) rather than the proposal. Also, I changed the for syntax to for each i from 1 to 10. Derrick Coetzee 20:55, 25 Aug 2004 (UTC)

Comments wrp103 (Bill Pringle) | Talk]] 01:44, 29 Aug 2004 (UTC)

General Comments:

My preference would be something that is obvious to the unlearned reader. Therefore, I would prefer something that looks more like C/C++, Java, etc. rather than ML. Far more people are familiar with the first group than ML.

If you look at it more closely, you'll see that the resemblence to ML is, at best, superficial — mostly just the type specifiers. The indention is more like Python and almost everything else is as in C. I mean, it is an imperative pseudocode. Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

I didn't mean to say that it looked like ML, but rather than features unique to ML would be foreign and confusing to many non-programmers.

wrp103 (Bill Pringle) | Talk]] 22:55, 29 Aug 2004 (UTC)

I would think that clarity is more important than consistency / ambiguity. In other words, even if two variations are somewhat ambiguous, as long as a "reasonable person" could deduce the intent, that is fine with me. The intent is to convey information, not produce syntactically correct code.

I agree with this general principle. By preventing ambiguity, I mean code that two reasonable people could read in two reasonable ways. Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

Specific comments:

I think that the use of parens for comments could get confused with function args. I would recommend double slashes for comments. Italics would be a good addition, but might be a problem if people forget the closing tags, so I'm not sure if italics would be worth the effort. Parens would be fine if the entire line was a comment.

As it turns out, the closing tags are a non-issue. In fact, this is what makes multi-line comments so frustrating: italics do not continue from line to line in an indented section. I think parens are pretty clear as long as every comment is at least two words and has some space before it; perhaps this is asking too much. I like parens because they should be clear even to non-programmers, from English text experience. Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

I would rather see the data type before the formal arguments where necessary.

This was a common source of disagreement. The reason I chose to put types afterwards is that complex constructed types, such as function types and pointer types with type specifiers, can lead to some very confusing notation and ambiguities in the C type notation. I refer you to A Modest Proposal: C++ Resyntaxed. On the other hand, you could argue we should only be using simple types, in which case the C syntax is fine. What do you think? Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

Any experienced programmer will understand whatever notation is used. However, the non-experienced programmer is more likely to be familiar with Java or C syntax, which have "type name".

wrp103 (Bill Pringle) | Talk]] 22:55, 29 Aug 2004 (UTC)

I also would prefer some kind of delimeter for blocks, although for simple cases, indents alone might be fine.

How about allowing { and } as optional delimeters? Keep in mind that most code samples should be small. Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

For example:

function add(int value, list)
{ // add value to end of list
    (indented body of function)
}

Statements can also be simply an english comment, ignoring all syntax issues, such as:

(add 5 to end of numberlist)

This is a good point. This should be made explicit somewhere. Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

Variables need not be declared, nor initialized. Comments can clarify usage

ent := maxent // set entry to highest possible value

This is already stated. Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

I would prefer := or = for assignment rather than ← because of the extra typing, and because it will be less obvious to a "newbie" how to type that.

I agree — := was used originally but changed to enhance readability. I think most readers understand := though, and it's easier to edit. Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

The for loop should not be so similar to the for each

 for i = 0 to 10
 { // loop through table
     (do things with table)
 }
 for each x in list
 { // iterative through list
     (do something with list entry)
 }

This again was changed to enhance readability for non-programmers. If you can suggest an alternate wording that avoids this similarity but is still clear to non-programmers, this would be great. Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

The two for loops I listed above were fine, IMHO. What I objected to was the "for each x from 1 to 10" and the "for each x in list", which I thought were to easy too confuse. I think most non-programmers could tell the difference between the above loops, but not necessarily where both were "for each".

wrp103 (Bill Pringle) | Talk]] 22:55, 29 Aug 2004 (UTC)

I would prefer simple characters for operators. Again, it is less typing, and a "newbie" would know how to type it.

There is a trade-off here between readability and editability. I think operators like ∈ could be replaced with in but operators like ≠ are difficult to replace in a universal, clear way. Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

I think <>, <, <=, etc. are perfectly clear, and easy to enter.

wrp103 (Bill Pringle) | Talk]] 22:55, 29 Aug 2004 (UTC)

I also don't think we want a large collection of "standard" operators. If a given topic will be using any specialized operators, provide a list at the top of the article explaining their uses. (I would recommend this even if we do settle on a standard set of operators. The reader may not know what they are.)

The mandatory link to the wikicode description partly serves this purpose. However, if the names of the operators are both immediately obvious in their intent and standardized across articles, I think this is ideal. Perhaps we should rename some of them. Any particular ones? Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

Rather than trying to define a global set of functions (such as the list and set functions), I would suggest that an article could provide a list of functions applicable to the subject matter.

For example, on the Linked List page, we might have:

function isGreater(node1, node2) // true if value of node 1 > node 2

Function definitions would only be necessary if they were germain to the topic.

I like this idea. In this way the relevent article for each datatype could describe its interface in other articles, and they link to it. A summary of them all could be placed on the Wikicode page. Is this what you had in mind? Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

I'm not making myself clear. I'm looking at this from the perspective of a newcomer, not people who have agreed upon a standard pseudocode. I am concerned that if the "standard" gets too complex / comprehensive / etc. that it will discourage new people from contributing. By having too many "standard" operations, functions, etc., we are making the contribution of content more difficult for newcomers.

For example, I read about Wikipedia when it first started, visited the site, didn't find much, and wandered off. A month or so ago, I read another article, and decided to look at it again. I typed in "Cheryl Wheeler" and found a page with a small paragraph or so. Since I'm a big fan of hers, and since I run her web site, I decided to add to the article. I knew nothing about how to do it, but I hit "edit", looked at what was there already, and started typing. That was because, basically, all I had to do was type in the content and not worry too much about formatting, etc. It looked like I just had to type in english. Granted, code samples are different, but I still want to avoid scaring off newcomers. I don't think that one sample using = and another using := would be a source of confusion.

My concern in a large standard is that somebody new to Wikipedia, but familiar with the topic, will be discouraged from contributing because they can't figure out the standards. Furthermore, I'm not convinced of the need for a "hard and fast" standard. If you and I each write articles on related subjects, but I use = and you use :=, and if you have one set of assumed routines and I use a different set of assumed routines, as long as we both list our assumptions, I doubt if anyone will get confused.

Going back to my first few contributions to Wikipedia, people corrected much of my initial contributions, but the rules were pretty straightforward, and I sort-of caught on quickly. The more complex the standards are, the harder it will be to understand, and the less likely people will contribute. The results will be great only if a lot of people are encourage to participate, not if a few people agree on a standard and fix up everyone else's contributions.

wrp103 (Bill Pringle) | Talk]] 22:55, 29 Aug 2004 (UTC)

Another concern I have is I've seen far too many things fail under the weight of their success. Right now, it may not seem like a lot of work to fix up somebody's pseudocode. However, as more and more people contribute, it will be harder and harder to keep up with the contributions. Far better, IMHO, are a lax set of standards where all that is needed is to fix up the particularly confusing sections.

wrp103 (Bill Pringle) | Talk]] 22:55, 29 Aug 2004 (UTC)

I have reasons behind most of the above suggestions (not just because I think so), and would be glad to expand if any of them if asked. (Its getting late, and I have to get up early tomorrow morning.)

I think the thing we should remember is that the reader may not be familiar with whatever standard we set forth, and we should not require them to look up the standard to understand the examples.

Agreed; clarity is paramount. Derrick Coetzee 17:35, 29 Aug 2004 (UTC)

Not just clarity. Simplicity. The more involved the standard, the less simple the content. The goal should be "how can we write this so that people can understand it and want to contribute" rather than "how can we make this consistent across different pages."

I would prefer a standard list "list all assumptions at the top of the article" rather than a description of what the pseudocode would look like. If two articles have two different assumptions, that's okay, as long as they identify them.

wrp103 (Bill Pringle) | Talk]] 22:55, 29 Aug 2004 (UTC)

Okay, just to avoid getting too bogged down in arguments over details, let me say:

I think the goal of keeping the pseudocode as readable and editable as possible is important. Our essential conflict is that of who our audience is. You're thinking primarily of relatively inexperienced C programmers. I'm thinking of them, but I don't believe it would be good in the long run to repeat errors in C's syntax (such as its horrible type notation, the =/== confusion, and case fall-throughs) solely for consistency. I also think, as others suggested, that constructs should learn more towards English and high school math notation where possible, in order to extend readability to a larger readership beyond programmers. Do you find these notions reasonable? If so, which syntax issues do you think deserve compromise for their sake?

I agree. I was not suggesting reproducing C syntax, but rather not adopting a syntax that would be foreign to newbies. I also don't believe that minor variations in how different people enter pseudo-code would detract. wrp103 (Bill Pringle) 03:03, 5 Sep 2004 (UTC)

Second: if you have a small change you believe would be beneficial, there's no need to discuss it here. Simply make it. It's still currently a proposal, not a standard. Many of your ideas for changes seem reasonable, and we can concentrate further discussion on more contentious points that I might choose to revert or re-change.

Will do. It might be a day or so before I get a chance to do it. A new semester starts up next week, plus a crunch project at work isn't leaving me with a lot of free time. wrp103 (Bill Pringle) 03:03, 5 Sep 2004 (UTC)

I hope this helps discussion and helps move the proposal towards something useful for everyone. RSVP, and be bold. Derrick Coetzee 01:25, 5 Sep 2004 (UTC)

As you have no doubt noticed, being bold is not a problem. ;^) wrp103 (Bill Pringle) 03:03, 5 Sep 2004 (UTC)

Gee, I had expected more changes, especially when you said I might not agree with some. ;^)

I think we are basically done with this standard. A couple of minor comments:

I started out using the K&R formatting for curly braces: opening brace on the same line as the function/if/whatever, and the closing brace on its own line. I switched originally because emacs automatically formatted it that way, but then got to prefer it because it is much easier to match open and closing braces. Using K&R, it was too easy not to notice that I forgot to include the opening brace. So, I would prefer having the open & close braces line up.

The reason I did this, although it may seem like a bad reason, is brevity. Lines that only carry braces carry very little information, reducing the amount of useful info you can put in the same space. Also, when a compiler won't see it, it really doesn't matter whether you drop an opening brace here or there or not (although I can't say I've ever done so — dropping closing braces is a more typical problem). In particular, if it causes a complete piece of code not to fit on someone's screen, this has been shown in studies to hurt comprehension. Derrick Coetzee 16:42, 21 Sep 2004 (UTC)

My thoughts on "add x to the end of the list" was that we should allow a descriptive statement, but wouldn't need to back it up with details. In particular, if you need to do something, but the details aren't germane to the discussion, I would like to see a vague description of what has to happen. True, that could be done with a comment statement, but that starts to confuse explanation with process. I would like to see pseudocode explain what happens, and comments to explain why it happens.

That's true. Comments seem to implicitly not do anything. I'll fix those changes. Derrick Coetzee 16:42, 21 Sep 2004 (UTC)

I think braces should be permitted in do-while loops to provide a consistent look-and-feel. Blocks should be consistently formed by delimeters or keywords, IMHO. (And I wouldn't object to keywords instead of delimeters.)

I do prefer braces to a keyword such as end, but the typical way of bracing a do-while loop breaks the "} on its own line" rule, which makes it not as consistent as it appears. Considering braces are now optional on nearly everything, there's not much consistency to worry about in the first place. That said, if someone wants to use braces on their do-while, it doesn't really hurt. I'll change this. Derrick Coetzee 16:42, 21 Sep 2004 (UTC)

Other than that, I have no problems with your changes. And, although I think my above ideas are valid, I could live with what is there now.

OK, what's next? ;^)

wrp103 (Bill Pringle) - Talk 14:39, 21 Sep 2004 (UTC)

Well, the next step would be to migrate this page to its new home and, more importantly, complete the "user guide" at User:Dcoetzee/Wikicode. Then we just hunt down articles using pseudocode and fix them up to comply. As it proliferates more people should notice, follow the link, and complain about it, at which point some kind of revisions might happen. Derrick Coetzee 16:42, 21 Sep 2004 (UTC)

Stepping loops in parallel

There are many applications for which one wants to loop over two or more collections in parallel. Consider, for instance, setting up a mapping for a simple substitution cipher, given plaintext and ciphertext alphabets. The "C-ish" way is to use an artificial index variable into both alphabets:

cipher := emptyMap
for index from 0 to size(plainAlphabet) - 1 {
    cipher[plainAlphabet[index]] := cryptAlphabet[index]]
}

This introduces a great place for an off-by-one error, since size(list) is here the number of elements, not the index of the last element. It is also slow when the collection is a list, not an array; and it can have an error if cryptAlphabet is shorter than plainAlphabet.

An alternative is to have a syntax to step through several collections in parallel, like this:

cipher := emptyMap
for each plainLetter in plainAlphabet
and each cryptLetter in cryptAlphabet {
   cipher[plainLetter] := cryptLetter
}

This can be distinguished from a nested loop by the use of the keyword and each, which indicates that the loop is stepping over several collections in parallel.

If one or more of the collections is unordered (like a set) then the pairing-up of elements is unspecified. This could be used when you need to pair each of a set with some element of a different set, but do not care which one.

The loop terminates when the shortest collection is exhausted.

Here's an alternate syntax, avoiding the and each keyword:

cipher := emptyMap
for each plainLetter, cryptLetter in plainAlphabet, cryptAlphabet {
   cipher[plainLetter] := cryptLetter
}

Thoughts? --FOo 20:47, 23 Sep 2004 (UTC)

The idea is good, but:

Your off-by-one error isn't really an issue, since wikicode arrays/lists are 1-based by default.
Efficiency isn't really an issue in pseudocode, although it might be if they try to translate it.
The syntax <coed>and each falsely suggests that you're iterating over all possible pairs of items from the two lists, rather than only corresponding pairs. The second syntax is better but I wouldn't say it's clear and obvious.
There's no precedent I know of for such a construct.

The C code has the advantage of making the "corresponding" part explicit and clear. Double-indexing is kinda ugly though. A better way is to avoid using parallel lists in the first place and use records instead:

 record cryptPair { plainLetter, cryptLetter }
 construct cryptList, a cryptPair list
 cipher := emptyMap
 for each pair in cryptList
     cipher[pair.plainLetter] := pair.cryptLetter

I don't know if this is convenient in the application you have in mind, though. Another possible more explicit syntax we could use:

 for corresponding plainLetter, cryptLetter in plainAlphabet, cryptAlphabet
     cipher[plainLetter] := cryptLetter

Another way is to add an operation that groups elements from lists into tuples, like ML's zip function:

 for (plainLetter, cryptLetter) in pairUp(plainAlphabet, cryptAlphabet)
     cipher[plainLetter] := cryptLetter

Wikicode doesn't have tuples (or pattern-matching) right now though, but they could be handy.

Derrick Coetzee 21:34, 23 Sep 2004 (UTC)

There is a precedent in Common Lisp's loop keyword, probably the most flexible looping construct around. That is where I got the for...and notation, and why it doesn't seem to me to be confused with a Cartesian product or nested loop, which here would be for...for. If I remember correctly, C's for(;;) also allows complex initialization and increment steps using comma — although it is not commonly used.

One trouble with your record example is that constructing cryptList requires traversing the source lists once — and then actually generating the map requires traversing this list again. O(2n), possibly O(3n). With a loop construct that can step both source lists at once (or with the C-ish index variable) you're back down to the "natural" O(n).

I rather like the pairUp() idea, though it does mandate tuples and destructuring. Destructuring is not a bad thing to have, and it fits a concept from natural language: "If John and Mary are husband and wife, then at some point they had a wedding." But — and don't take me too seriously here! — if you are going to have tuples and destructuring in your pseudocode, why not just use Python? :) — FOo 22:21, 23 Sep 2004 (UTC)

Another objection to the whole idea of Wikicode

Hello. I see that the Python code in Pascal's triangle has been replaced by Wikicode. Frankly, it's not an improvement: it's not any clearer, and it can't be executed at all, by anyone. I agree that not everyone is familiar with Python; only about 100,000 people, is my guess. This is some orders of magnitude greater than the number of people familiar with Wikicode.

I see also that there's a list of "Pages needing conversion". Needing? Isn't that a bit strong, folks? Will converting into an officially uglified pseudocode improve any of those articles?

In closing: (1) Language design is hard, and there is nothing to gain by reinventing the wheel here, and a lot to lose. (2) If it ain't broke don't fix it.

Regards, Wile E. Heresiarch 06:08, 24 Sep 2004 (UTC)

I see your point, and while the "Wikicode" idea is interesting, its lack of verifiability may be a killer. Nobody can take a piece of Wikicode and run it in any system to determine that it does what it is claimed to do. There are no implementations, and not likely to be any.

The question, then, is whether the lack of verifiability trumps the idea of "compromise" which suggests that using any particular real language would express bias towards that language and against all others. I can't say.

It's true that using a given real language would make it easier for those proficient in that language to modify examples written in that language. However, I think it is a remarkable exaggeration to call this an NPOV violation as has been done above. Using (say) Pascal in a piece of sample code does not carry the implication that Pascal is a perfect language; that it is good for any particular real-world job; that other languages would not suffice for that example; or that people should eschew other languages in favor of Pascal. (If it did, of course, it would indeed be unacceptable for NPOV reasons.)

A thought: The idea of a "teaching language" — an actual, implemented language designed for teaching algorithms rather than for systems or applications programming — is so strong that several languages (including Pascal, LOGO, and BASIC) have been designed for this purpose. It is expected that students who learn such a language must not take it to be the end-all of programming, but go on to learn many other languages.

And yet any "teaching language", including an unimplemented one such as Wikicode, necessarily carries similarities to "real" languages. Wikicode is clearly a structured language (like Pascal or C) rather than a functional, stack-based, or object-oriented one. If the "NPOV" concern above were as serious as it is proposed to be, then Wikicode as described here would be unbearably biased against aficionados of Lisp, Scheme, ML, Haskell, Forth, C++, Java, Smalltalk, Perl ... and so on. This would make it at least as unacceptable as using Pascal or any other particular language.

What, then? I'm not sure. I agree with Wile E. that it is remarkably premature to go listing "pages needing conversion" to Wikicode. It may forever be premature. After all, there are plenty of programming-language and algorithmic concepts which cannot be illustrated in the language that we have described as Wikicode — say, continuations. More and more I have to suspect that the Right Thing for any given article will be to use whatever language lends itself to the example. —FOo 02:25, 25 Sep 2004 (UTC)

I certainly agree that there is no reason to convert code samples to wikicode. However, I don't think that is the intent of wikicode. The purpose of wikicode was to standardize on pseudocode, not to force all examples to be in wikicode. In fact, on a number of the pages that I worked on, there were some code examples written in a specific language along with examples in pseudocde, which I converted to wikicode. The code samples I left alone.

If a code sample uses a particular language, that is great, but in many cases the actual language adds some complexity that can be avoided by using pseudocode. For those cases, the suggestion has been made to use a standard pseudocode.

I started the list of pages needing wikicode, and it was based on pages containing examples using some form of pseudocode. If some actual code got replaced, then that was probably a mistake. The hope is that by creating a "standard" pseudocde, examples will become more acessible. The standard deliberately leaves a lot of leeway for those cases where something different is more suitable for explaining a concept. The purpose wasn't so much as to lock an author into a single way of writing an example, but rather providing a consistent presentation of pseudocode. wrp103 (Bill Pringle) - Talk 06:34, 25 Sep 2004 (UTC)

Um, the pages describing Wikicode don't agree with your good intentions. To begin with, the term "standard" is used repeatedly, and standards are generally understood to place limits on the range of allowed expressions. User:Dcoetzee/Wikicode/Pages needing conversion states In order to standardize, all this pseudocode needs to be converted. User:Dcoetzee/Wikicode states Whenever you write pseudocode in an article, this style should be used. Template:Wikicode used to state The following is wikicode, a standard pseudocode for Wikipedia articles. It is quite clear that the "standard" does not leave a lot of leeway for those cases where something different is more suitable for explaining a concept -- in fact, at least one author has already felt constrained by the current proposal. See the comment by User:Pakaran dated 2004/09/24 at User talk:Dcoetzee/Wikicode. In fact, existing, working Python code has already been replaced by Wikicode (see Pascal's triangle) and there are at least two other existing Python examples on the list of pages needing conversion. While I am in general agreement with your more tolerant approach to pseudocode in articles, you're not describing Wikicode. Wile E. Heresiarch 16:27, 25 Sep 2004 (UTC)

It does seem that some people are interpreting this more stringently than my vision. The standard calls itself an informal standard, and it only applies to pseudocode, not actual code segments. I didn't do anything with the Pascal's Triangle, but if it was actual code, it shouldn't have been replaced (IMHO). To provide flexibility, I inserted an escape clause in the standard (under general guidelines): "The following guidelines are intended to provide a consistency across various examples that use pseudocode. Conformance to these standards is encouraged. However, exceptions are permitted where the deviations increase the simplicity and/or clarity of the example." In my opinion, if actual code isn't too complex, that's fine; otherwise if wikicode works, use it; otherwise, use something else, but explain what you are doing. wrp103 (Bill Pringle) - Talk 22:08, 25 Sep 2004 (UTC)

(1) User:Dcoetzee seems to be running the show here. He is certainly interpreting Wikicode much more narrowly than you. Probably the clearest expression of his attitude is to be found in the page history of pseudocode -- having discovered an older, alternate, much more informal pseudocode proposal, he erased it and replaced it with the statement The standard pseudocode on Wikipedia is wikicode [1]. The edit comment -- (Erased "standard", linked to official standard, see talk page) -- is quite revealing. (2) Again, I appreciate your more inclusive attitude, but I see exceptions are permitted is buried several paragraphs down from the top of User:Dcoetzee/Wikicode/Specification. I am pretty certain that casual readers are going to overlook it, or having seen it, they will understand it in a limited sense -- exceptions are permitted sounds like a statement from a judge. What's worse, toleration of alternate constructs is not mentioned at all on the main User:Dcoetzee/Wikicode page. I suspect that is not an accident. Wile E. Heresiarch 03:14, 26 Sep 2004 (UTC)

I understand your criticism, and I'm sorry the proposal is not more explicit about this (I will update it), but I do invite you to use your own constructions or any real language if you find it more appropriate in any particular case. This is simply a set of guidelines to follow where there is not a reason to do otherwise. I converted the example on Pascal's triangle because I found the wikicode version more readable, but if you disagree a revert is totally okay with me. You can also remove the other Python examples from Pages needing conversion if you feel the Python versions are acceptable. As for the pseudocode article, I incorporated ideas from this proposal, and I invited its contributors on the talk page to look at the new proposal and make any changes they wanted. But I don't believe such a standard belongs in the article namespace, and this one is more detailed and comprehensive, so with no objections I replaced it. I hope you don't consider this too bold, and I appreciate your comments. Derrick Coetzee 17:47, 1 Oct 2004 (UTC)

One more thing- was there ever actually a vote to make Wikicode official WP policy? If not it isn't legitimate and should be on VFD right now Cynical 20:42, 13 Oct 2004 (UTC)

Sure was. I put it on the main WikiProject Computing page and attempted to draw attention to it for several weeks, but nobody seemed to care. I only got a couple votes. I was hoping experimenting with putting it in some real articles would help it gather more attention from people who care about the issue. It is still only a proposed standard, even though it is in use. Maybe there should be a more highly publicized vote, mentioned on the Village Pump or something? I don't know. Derrick Coetzee 22:14, 13 Oct 2004 (UTC)

A bad idea

"Wikicode" is a bad idea. Pseudocode can be, and should be, a mish-mash of natural language and any or all programming languages. That's the whole idea. Alternatively, you can write in a real programming language with well-defined syntax and semantics. (Good code examples can be as clear as pseudocode, if written in the right language), Wikicode seems to have all the problems of both, without the advantages of either.

Good languages for programming examples:

Imperative:

Python (looks like executable pseudocode, anyway)
Java (Looks a lot like C, good for flipping structures around)
C (Lots of people know this, good for showing low-level bit-twiddling and hardware-banging)
x86 assembler (best-known asm language nowadays, to show off really low-level stuff)

Functional languages:

Lisp
ML

-- The Anome 19:33, 27 Sep 2004 (UTC)

I might go so far as to say that Wikipedia should settle on several different languages for examples of concepts that map best into those languages. For instance, there are articles which refer to (or must eventually refer to) bit-level data structures. These could be best illustrated with C structs. Algorithms involving continuations might be best illustrated in Scheme, wherein continuation programming is natural. Discussions of the "design patterns" movement (itself an adaptation of C++ and Java programmers to their languages' strengths and weaknesses) can be illustrated with examples in those languages. And examples of relatively generic algorithms (like, say, sorting algorithms) can use Python, which is so close to "executable pseudocode" (as long as you stay out of metaclass hacking). —FOo 00:55, 28 Sep 2004 (UTC)

With the caveat that I am a new and naive user of wikipedia, my preference would be for each algorithm to be rendered in a variety of common executable languages. This would allow me to see the algorithm in a language with which I am familiar. It would also give me the opportunity to see the same algorithm expressed in a less familiar, but potentially better suited, language. Programmers will naturally wish to contribute an example of the algorithm in their favorite language, so contributions should not be a problem. As I see it, the major issue would be one of display. How can each algorithm be coherently displayed in multiple languages? I don't know the details of wikipedia, so I do not know what common options you have for addressing such issues. Ideally, I imagine I would like to specify a viewing preference that would default to my language of choice when reading pages with example code. I would also want a small listing of the alternative languages available that I could click in order to see the code rerendered. I don't believe that I would ever click on "wikicode". My current default preference would be Python. --01:46, 30 Sep 2004 (UTC)

x86 assembler? Are you insane? :) (Well, basic assembler shouldn't be a problem, but the whole x86 architecture is arcane as hell) Dysprosia 04:40, 30 Sep 2004 (UTC)

No, the whole idea of using x86 assembler would be to show the gnarliness of the low-level environment. You wouldn't use an assembler example for anything except something that needed assembler, which typically means breaking the usual procedure-call model of programming (context switches, interrupt handlers), or banging the hardware in some devious way. Alternatives: System/360 assembler (very nice, actually), ARM assembler, MIXAL, if we can use MIX as a reference architecture. I still prefer 80486-or-better x86 code: it's a realistic real-world example. -- The Anome 16:44, 30 Sep 2004 (UTC)

Once again, I do not recommend replacing any particular real-language example, unless the wikicode is clearer (which is of course a matter of debate for each particular such change). I also do not specifically exclude any construct that would be more useful to demonstrating the issue at hand. However, this is not true:

Pseudocode can be, and should be, a mish-mash of natural language and any or all programming languages. That's the whole idea.

This is not the idea behind pseudocode; inconsistency is the issue being attacked here, because it is confusing and unnecessary. While many textbooks have different pseudocodes, the good ones each stick to a single one consistently, with occasional useful extensions. The goal is not to force restrictions upon the editor, but to avoid frivolous differences between pseudocode in different articles, just as the general layout of an article is relatively standardized. Derrick Coetzee 17:55, 1 Oct 2004 (UTC)

Pseudocode should enable the student to focus on the algorithm being discussed, and ignore implementation issues. As such, while some guidelines for uniformity can help in quickly grasping the essence of simpler algorithms, they cannot be fully specified, but must be adapted to the problem domain. The developers of Wikicode have made an admirable effort, but I fear in quite the wrong direction. They have ended up with something that is becoming very close to a completely specified language. Instead of helping us ignore implementation, it focusses on it - but in a language that doesn't even have a compiler. (For example, one of my first thoughts when glancing at the spec was "oh, it's a typed language. I wonder what are the rules for casting?"). I recommend that the spec be renamed to "guidelines", and that they be greatly simplified, especially getting rid of all implementation-specific stuff like types. But above all it should be clear that something like "rotate each point 180° around the axis specified by the vector u" is perfectly good pseudocode (assuming the student has already covered rotations), whereas "for each point in pointset { point->rotate(u,180) }" is bogged down in implementation irrelevancies and therefore is not. Securiger 08:43, 14 Oct 2004 (UTC)

non-binding straw poll

No standard

Kaihsu 15:55, 2004 Sep 27 (UTC)
Wile E. Heresiarch 16:44, 27 Sep 2004 (UTC)
The Anome 19:22, 27 Sep 2004 (UTC)
Yath 22:52, 27 Sep 2004 (UTC)
FOo 00:50, 28 Sep 2004 (UTC)
Cynical 20:41, 13 Oct 2004 (UTC)
Securiger 08:13, 14 Oct 2004 (UTC)
Mark Hurd 10:21, 14 Oct 2004 (UTC) Pseudocode should be high-level, just below English. Some guidelines or rules to follow only when there's disputes would be better.

No use of languages which are invented specifically for Wikipedia

Jamesday 17:21, 1 Oct 2004 (UTC) Showing real code is one of the strengths of the variety of examples we have, as is the variety of pseudocode styles.

Wikicode (as proposed) as standard instead of source code

no votes yet

Wikicode (as proposed) as standard for pseudocode examples only, source code examples left alone

wrp103 (Bill Pringle) - Talk 19:02, 27 Sep 2004 (UTC)
Dcoetzee, with small changes where necessary - Derrick Coetzee 17:56, 1 Oct 2004 (UTC)

Python as standard

Kaihsu 15:55, 2004 Sep 27 (UTC)
Alex Reicher
ciphergoth

Java as standard

no votes yet

C/C++ as standard

Enrico 13:49 2004 Oct 05 (UTC)

Something else

no votes yet

Changes

The proposal has been updated to address some of the issues the discussion here has raised. If there are still objections, I invite them. My purpose for Wikicode is only to be a helpful way of improving consistency across articles where possible without interfering with clarity, and not to restrict the editor in any way. I hope you will excuse my bold edits, and I invite your reverts and criticism.

To be more specific, my changes are:

Added prominent language to User:Dcoetzee/Wikicode and User:Dcoetzee/Wikicode/Specification explicitly noting that extensions and changes can be introduced wherever there is a reason to do so, and that wikicode does not restrict but is only strongly suggested.
Added a process to User:Dcoetzee/Wikicode/Pages needing conversion for noting pages that editors may feel should not be converted.

Derrick Coetzee 18:14, 1 Oct 2004 (UTC)

Derrick, I appreciate your efforts to address the issues raised here, but I still feel that no standard is required here. There is some diversity in the way pseudocode is used in different articles, but, frankly I don't see that the differences are anything more than cosmetic and I don't see there's any potential for confusion among the readers. Furthermore, Wikicode proposes to replace statements that are truly pseudocode (i.e., not anywhere near executable) with something that is like a C/Pascal hybrid which for some reason lacks a compiler; I think Wikicode has lost the "pseudo" in "pseudocode", and that is definitely a drawback. -- Doubtless some articles contain pseudocode that could be polished, and some articles contain pseudocode that could be improved by restating in executable code (I'm thinking of bit-twiddling functions like RC4 in particular), and some articles contain code in languages that seem suboptimal (I think there's some Javascript floating around). So there's work to be done to make improvements along these lines, but I don't see that any of this is accomplished by Wikicode. For what it's worth, Wile E. Heresiarch 21:57, 2 Oct 2004 (UTC)

I agree about RC4 and the Javascript. I find it a bit strange that people have accused wikicode both of being too C/Pascal like (perhaps Wrp's influence), of being too much like ML, and of being too much like Python. Regardless of this, though, from looking at a number of articles I can assure you the "cosmetic" differences in idiosyncratic pseudocode, while comprehensible to a reader with a decent background, are quite various, and reflect poorly on the Wikipedia. Part of wikicode's design is to attempt to look nice and increase uniformity. The idea is similar to that behind Wikipedia: Manual of Style and our loose conventions for article layout. Uniformity unites disparate articles into an encyclopedia.

Just to make myself clear, my original preference was for the standard to be more like what I wrote in the General Guidelines section - use lots of comments and describe not only what you are doing, but why. As Derrick probably recalls, I was not in favor of specifying a standard syntax, but if one was to be defined, it seemed to me that it should be more like C or BASIC so that the widest audience could understand the examples. Experienced programmers could probably look at any kind of pseudocode and understand what was being illustrated. Newbies, however, would miss a lot if it didn't look like something they had seen before. Nevertheless, I still feel that the use of real code segments is often preferable, and pseudocode mostly used where language details would tend to cloud the description. wrp103 (Bill Pringle) - Talk 18:24, 4 Oct 2004 (UTC)

Also, although the proposal doesn't emphasize this enough, the pseudo is not gone from pseudocode, and English explanations can be substituted for the more low-level code at any point, as well as any necessary extensions created. Wikicode is also in some respects more high-level than many real languages, such as with its unrealistic types. However, if you have suggestions for making the pseudocode more pseudo, I'd be happy to hear them.

Moreover, as others have pointed out, a reader who for any reason doesn't understand a particular pseudocode notation in a random article have nowhere to turn — even the talk page may not help if the original author is gone. With Wikicode at least they can consult the guide, view other articles using it, and consult people familiar with it. Real languages work in this regard too, and I don't discourage their use, where they are clear and brief. Indeed, in some cases even English mixed with real code may be more appropriate.

But where pseudocode is used, I'm recommending this. I'm not forcing any editor to use it, but I do believe the idiosyncratic pseudocodes should generally be changed to wikicode, for the reasons above. Awaiting your retort, Derrick Coetzee 01:19, 3 Oct 2004 (UTC)

I don't think you can have it both ways -- I'm not forcing any editor to use it isn't consistent with idiosyncratic pseudocodes should generally be changed to wikicode. Although the proposal has been watered down somewhat, it's still called a standard, implying the desired effect is to limit the range of allowed expressions. There's an easy way to make pseudocode more pseudo, and that's to drop the "standard". There are certainly examples of code or pseudocode that can be improved, but I don't see that a pseudocode standard is the way to go about that. I guess I'm not against having a collection of example pseudocode constructs to help out anyone who wants to learn about pseudocode. For what it's worth, Wile E. Heresiarch 01:07, 4 Oct 2004 (UTC)

They're not inconsistent at all if you recall that Wikipedia articles are collaborative, and that those more familiar with the standard can convert pseudocode written by those who are not. Expression is only limited in a superficial (syntactic) way, since no semantic construct is prohibited from being used. In my apartment complex, tenants are prohibited from having blinds which are not white; they say this is for uniform external appearance, and makes little practical difference to the tenants. The purpose is similar here — ensuring a uniform syntactic appearance to readers by avoiding frivolous differences, while not introducing semantic restrictions. Derrick Coetzee 02:56, 4 Oct 2004 (UTC)

This doesn't do much to address the problems. Exposure to a variety of real languages and a variety of real pseudocode styles are major strengths which this proposal would eliminate or discourage. The use of any consistent language or pseudocode style should be strongly discouraged. Jamesday 13:28, 10 Nov 2004 (UTC)

First of all, I don't disagree with having a variety of real language examples in articles, often in addition to pseudocode examples, nor do I disagree with using alternate pseudocodes as objects of discussion, just as foreign language examples are used in English articles. What I do disagree with is having a variety of different pseudocodes, one invented for every article requiring pseudocode, where they could just as well be using one consistent one.

If every reader viewed only one article, then it wouldn't really matter what that one article used, but many readers view dozens of related articles from Wikipedia, and consistency in our pseudocode style is as important for improving our appearance and aiding their comprehension as having a consistent article format. There's a reason why virtually every textbook that uses pseudocode uses one consistent pseudocode throughout. Consistency across articles is what makes Wikipedia an encyclopedia and not a collection of disparate web pages. But I'm repeating myself. Deco 19:19, 11 Nov 2004 (UTC)

So far, the articles I've seen that use wikicode would be better served by using Python. In these cases, the differences between the text of the executable Python and the wikicode are superficial, while the ability to actually run the code and play with it is not. Pseudocode using natural language is an essential tool for describing algorithms and would certainly benefit from a style guide with examples of well-written pseudocode. Like English, the complexities of high-quality pseudocode necessitate a style guide as opposed to a rigid specification. Wikicode seems to have missed the mark by becoming too much like a computer language specification, and hence does more harm then good by replacing executable examples with an obscure and unsupported computer language. --19:34, 16 Nov 2004 (UTC)

Indentation style

We specify an indent style in the wikicode? That's awful. We should let writers use their own indent style. Dysprosia 12:24, 30 Dec 2004 (UTC)

There's a reason most companies specify an indent style in their code style standards. Since the point of the pseudocode is not functionality but consistent, uniform appearance, I think it makes sense to specify this sort of thing. It's also rather necessary to some degree when braces are optional. Deco 17:01, 30 Dec 2004 (UTC)

Like American vs British spellings, the indent style should be consistent in the article. Dysprosia 00:28, 31 Dec 2004 (UTC)

That's one opinion, but I think consistency across multiple articles is more important than any added "expressive" benefit of individual whitespace use. I see the American vs. British spelling as a concession to consistency for the sake of editor peace and the needs of regional readers on regional topics. But I see no way in which readers might benefit from several articles having wildly different indentation styles. No editor would be punished for indenting in a nonstandard way, it would just be fixed by another editor. Deco 01:46, 31 Dec 2004 (UTC)

The reader can begin to understand that not everyone uses the one indentation style. And we're not talking "wildly different", just keeping to the styles described in indent style.

Anyway, this is just an issue of annoyance, since there really wasn't much discussion to using K&R as standard in the first place. Dysprosia 08:01, 31 Dec 2004 (UTC)

My concession

It seems, at least based on the above responses, that the entire idea isn't receiving any support. But it is not wikicode itself I believe in here. My logic at its core is simply that gratuitous inconsistency is bad; this seems manifest to me. I appreciate the vast diversity of our community, but I think limiting superficial expression only improves uniformity, improves understanding, and enhances editors' chances for more profound expression.

So I guess my question is, why isn't anyone happy with wikicode, and how can I help move it towards something consistent that the community will want to use? I'm well aware that a standard without followers is not. The overwhelming agreement seems to be that a standard is unnecessary. If so, why? I really want to understand. Please help. Deco 10:51, 31 Dec 2004 (UTC)

I support the idea of wikicode. Basically I think we should present code in a Python-like language that doesn't carry so much baggage. For example, we can use English sentences (or sentence fragments) instead of requiring the reader to understand some obscure Python library. (Further, Python has the baggage that there are no block-terminators: i.e., no "}" or "end"s or "fi"s or "repeat"s. By adding such terminators, we can make it a lot less ambiguous to all readers.) In otherwords, we're basically right on track: removing the quirks of Python, and making it higher-level. So, I support the idea myself. --MShonle 20:55, 31 Dec 2004 (UTC)

I have always felt that the wikicode rules should be less about syntax and more about overall "look and feel". The more we make it look like a formal language the harder it will be for editors to get it right and for readers to understand what was intended. I would rather have free-flowing text intermixed with some type of pseudocode that was internally consistent than to try to get everyone to write examples in the same language. IMHO, the emphasis should be on expressing the concept rather than conforming to an arbritrary language syntax, whether it be wikicode, python, C, Java, or whatever. As long as the meaning is obvious, I don't see the problem with variations in pseudocode examples.

For example, I doubt that many people would get confused about any of the following lines:

a = b + c
a = b + c;
a := b + c;

IMHO, I think the less we try to formally specify the syntax, the better it will be. I would rather encourage people to generously comment their pseudocode instead of tell people what it should look like. If there are lots of comments, then the exact syntax shouldn't matter. wrp103 (Bill Pringle) - Talk 22:57, 31 Dec 2004 (UTC)

A different proposal

I agree with all who have said that inventing a Wikipedia-specific language is a bad idea. I also believe that it is senseless for articles to contain 10 different implementations of everyone's favorite language and that programming articles should have one consistent programming language. Imagine an introduction to CS textbook that implemented every other example in a new language... not good for learning.

If you look on SourceForge by programming language, the languages there have these project counts:

15177 C++
14638 C
14578 Java
10767 PHP
5788 Perl
3911 Python
2320 C#
2263 JavaScript
2066 Visual Basic
1759 Delphi/Kylix

Which is interesting, but I don't think any of those languages are suitable example languages for many reasons, not the least of which will be endless battles between advocates of different languages, as well as unfairly making editing harder for people not as skilled in that language.

My proposal is simple and is based on these assumptions and premises:

use an existing language
use a language designed to have simple clear syntax (read: teaching language)
do not pick a top 10 language, choose neutral ground
do not pick a language because computer experts like it, but because it works best in an encyclopedia
readability is more important than runability on your computer (runability is going to be different for most of the reader base)
designing our own language is silly (and the straw poll backs this up)

The proposal:

In each programming article, use the Turing programming language in the article text.
Below the example, put a list of links in short form like C, Pascal, Python, Perl, ... to sub-articles with additional examples in more useful languages.

Both parts of the proposal are critical.

Daniel Quinlan 10:05, Apr 11, 2005 (UTC)

TfD nomination of Template:Wikicode

Template:Wikicode has been nominated for deletion. You are invited to comment on the discussion at Wikipedia:Templates for deletion#Wikicode. Thank you.