Sat Jan 24 09:27:16 PM IST 2026
===============================

There were too many files tracking different thoughts and ideas for
things to do, or consider doing.  This file merges them into one. As
tasks are completed, they should be removed.

This file should exist only in the master branch or branches based off
of it for development, but not in the stable branch. This may require some
careful work with Git.

TODO
====

Roadmap for 6.0
---------------

	Nuke regex and dfa and the additional files needed for them.

	Get rid of lint-old.  Old awk has been obsolete for close to 40
	years now.

	Get rid of the --persist option.

	Get rid of the -r/--re-interval option(s).

	Get rid of AWKNUM. Just use double.

	Clean up obsolete bits in the extension API. This will
	require incrementing the major number.

	Eli Zaretskii suggests making codeset info available in the
	extension API.  Along similar lines, perhaps the API should
	support access to the wide character version of a string.
	Also mb_cur_max.

Minor Cleanups and Code Improvements
------------------------------------

	Tzvetelin Katchov <katchov@gnu.org> points out the following:

	- Clean up file and directory permissions to 0644 and 0755
	across the board.

	- Use consistent #! lines in awk scripts in the test suite.

	- Double check uses of xxx:: targets in the test suite Makefile,
	maybe only one colon is needed.

	API:
		??? #if !defined(GAWK) && !defined(GAWK_OMIT_CONVENIENCE_MACROS)

	?? Add debugger commands to reference card

	Look at function order within files.

	Fully synchronize whitespace tests (for \s, \S in Unicode
	environment) with those of GNU grep.

	In the test suite, use grep -E if no egrep or egrep produces
	a warning.

Minor New Features
------------------

	Store the filename and line number where a variable is
	first used.

	Enable command line source text in the debugger.

	Enhance extension/fork.c waitpid to allow the caller to specify
	the options.  And add an optional array argument to wait and
	waitpid in which to return exit status information.

	Consider relaxing the strictness of --posix.

	Enhance --lint=invalid to apply in more places.

	Suggested by Jannick:
	* It is possible to make gawk look for a file with extension name
	plus API version number(s) in case a shared lib with the expected
	basename cannot be found?  This would help have extension versions
	compiled against different API versions in one single directory
	and make gawk pick the extension with the right API version.

Major New Features
------------------

	Think about how to generalize indirect access. Manuel Collado
	suggests things like

		foo = 5
		@"foo" += 4

	Also needed:

		Indirect through array elements, not just scalar variables

	Rework management of array index storage. (Partially DONE.)

	Consider using an atom table for all string array indices.

	DBM storage of awk arrays. Try to allow multiple dbm packages.

Things To Think About That May Never Happen
-------------------------------------------

	Consider making shadowed variables a warning and not
	a fatal warning when --lint=fatal.

	Similar for extra parameters in a function call.

	Look at code coverage tools, like S2E: https://s2e.epfl.ch/
	
	Try running with diehard. See http://www.diehard-software.org,
	https://github.com/emeryberger/DieHard

	Add a lint check if the return value of a function is used but
	the function did not supply a value.

Things That We Decided We Will Never Do
=======================================

	Add ability to do decimal arithmetic.

	Review the bash source script for working with shared libraries in
	order to nuke the use of libtool. [ Partially started in the
	dead-branches/nolibtool branch. ]

	Include a sample rpm spec file in a new packaging subdirectory.

	Patch lexer for @include and @load to make quotes optional.

	Add an optional base to strtonum, allowing 2-36.

	Optional third argument for index indicating where to start the
	search.

	See if something like  b = a "" can be optimized to not do
	a concatenation, but instead just set STRCUR on a.
	(Tried this; the type of b doesn't come out correctly.)

	Consider moving var_value info into Node_var itself to reduce
	memory usage. This would break all uses of get_lhs in the
	code. It's too sweeping a change.

	Add macros for working with flags instead of using & and |
	directly.

	Fix regular field splitting to use FPAT algorithm.
		Note: Looked at this. Not sure it's with the trouble:
		If it ain't broke...

	Scope IDs for IPv6 addresses

	Gnulib

	Make FIELDWIDTHS be an array?

	"Do an optimization pass over parse tree?"
	This isn't relevant now that we are using a byte code engine.

	"Consider integrating Fred Fish's DBUG library into gawk."
	I did this once as an experiment. But I don't see a lot of value
	to this at this stage of the development. Stepping through things
	in a debugger is generally enough. Also, I would have to try to
	track down the latest version of this.

	"Make 	awk '/foo/' files...	run at egrep speeds" (How?)
	This has been on the list since the early days (gawk 1.x or early
	2.x).  But I am not sure how to really do this, nor have I done
	timings, nor does there seem to be any real demand for this.

	Change from dlopen to using the libltdl library (i.e. lt_dlopen).
	This may support more platforms.  If we move off of libtool
	then this is the wrong direction.

	A RECLEN variable for fixed-length record input. PROCINFO["RS"]
	would be "RS" or "RECLEN" depending upon what's in use.
	There is a reclen extension in gawkextlib. That's good enough.

	Rewrite in C++.

	Consider making gawk output +nan for NaN values so that it
	will accept its own output as input.
		NOTE: Investigated this.  GLIBC formats NaN as '-nan'
		and -NaN as 'nan'.  Dealing with this is not simple.
	Gawk already produces +nan and -nan for NaN values, it's just
	that the sign may not always be what one expects.
