▲CLI tools hidden in the Python standard librarytil.simonwillison.net

552 points by swyx 670 days ago | 154 comments

BoppreH 670 days ago [-]

Speaking of hidden Python tools, I'm a big fan of re.Scanner[0]. It's a regex-based tokenizer[1] in the `re` module, that for reasons is completely missing from any official documentation.

You give it a pattern for each token type, and a function to be called on each match, and you get back a list of processed tokens.

Importantly, it processes the list in one pass and ensures the matches are contiguous, where a naive `re.findall` with capture groups will ignore unmatched characters. You also get a reference to the running scanner, so you can record the location of the match for reporting errors.

    import re
    scanner = re.Scanner([
      (r"[0-9]+",       lambda scanner, token:("INTEGER", int(token))),
      (r"[a-z_]+",      lambda scanner, token:("IDENTIFIER", token)),
      (r"[,.]+",        lambda scanner, token:("PUNCTUATION", token)),
      (r"\s+", None), # None == skip token.
    ])

    results, remainder = scanner.scan("45 pigeons, 23 cows, 11 spiders.")
    assert not remainder
    print(results)

    [('INTEGER', 45),
     ('IDENTIFIER', 'pigeons'),
     ('PUNCTUATION', ','),
     ('INTEGER', 23),
     ('IDENTIFIER', 'cows'),
     ('PUNCTUATION', ','),
     ('INTEGER', 11),
     ('IDENTIFIER', 'spiders'),
     ('PUNCTUATION', '.')]

[0]: https://stackoverflow.com/a/693818/252218

[1]: https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization

Timon3 670 days ago [-]

It seems like the discussion of whether to document it died in April of 2003: https://mail.python.org/pipermail/python-dev/2003-April/0350...

Bummer, it's a cool feature, but I don't feel safe relying on undocumented features.

bombolo 670 days ago [-]

Especially now that they are on a crusade against their own stdlib.

They regret including most modules… it seems they regret making python altogether instead of sticking with C? :D

Znafon 670 days ago [-]

This is quite an oversimplification. Some modules have no active maintainers and take time from the small team that work on CPython. Some modules are deprecated and removed to make it possible to allocate time on other improvements, and they went back on some removals when users came forward showing that they were still needed.

There is some discussion about the rationale at https://peps.python.org/pep-0594/#rationale

bombolo 669 days ago [-]

Did you read the latest news or not? They regret asyncio! (and many other core modules).

They won't remove them… but they regret having made them.

I think without, people would have just not used python.

Redoubts 669 days ago [-]

After using Trio, I would regret asyncio too

vamega 669 days ago [-]

Where was this news? I’d like to read this.

kzrdude 670 days ago [-]

This is a typical programmer, you'll regret code as soon as you turn your back on it. :)

NegativeLatency 670 days ago [-]

Or before!

Qem 670 days ago [-]

> They regret including most modules… it seems they regret making python altogether instead of sticking with C? :D

I also find it odd. Python would probably be a little known language without the huge batteries included by default. It's invaluable when you're working in a environment where you can't fully control what is installed, what is the case of most people at work. I believe this crusade endangers the language long-term.

oefrha 670 days ago [-]

You're reading a ridiculous mischaracterization of reality (~20 deprecated modules [1] out of the ~300 PSL modules [2] is "most"?). Of course you find it odd.

[1] https://peps.python.org/pep-0594/

[2] https://docs.python.org/3/library/

toyg 670 days ago [-]

It's the trend that is a bit worrying. It's definitely been floated more than once that the stdlib should be fundamentally gutted. It has not become a reality yet, but there are some loud voices advocating to just chuck most modules as a principle into pypi.

eesmith 670 days ago [-]

"Trend"? Python has dropped quite a few modules over the decades.

"gl", "sgi", "fl", "sunaudiodev", "audioop", "stdwin", "rotor", "poly", "whatsound", "gopherlib" .. the list goes on.

Some of these were dropped with 1.6 (see "Obsolete Modules" at https://www.python.org/download/releases/1.6/ ). Some with 3.0 (see https://docs.python.org/3.10/whatsnew/2.6.html for then-upcoming removals.)

dragonwriter 670 days ago [-]

There are two strong reasons for that:

(1) Not letting little used modules that pose maintenance problems drive up the cost of maintaining Python, and

(2) Not forcing actively used, actively developed modules to be limited to the core language upgrade cadence (and not forcing users to upgrade the language to get upgrades to the modules.)

Ruby I think has a decent approach to this (particularly, one that deals with #2 better than just evicting things entirely from the standard distribution) with “Gemification” of the standard library, where most things that are moved out of the standard library aren’t moved out of the standard distribution, but into a set of gems distributed with the standard distribution but which can be upgraded independently.

oefrha 670 days ago [-]

Maybe I've been out of the loop for the past couple of years since I've been writing less Python and don't follow Twitter drama, but IIRC none of the "stdlib is where module goes to die" crowd has ever advocated a fundamental gutting of the existing stable and widely used modules in PSL.

bombolo 669 days ago [-]

I was referring to this: https://pyfound.blogspot.com/2023/05/the-python-language-sum...

Of course nobody even knew so the default response is to hit that bottom arrow :D

arp242 669 days ago [-]

"Had possibly been added before it was fully baked" is not the same as "we should never have added anything like it", and neither is "we have a much better solution but no one uses that because asyncio is in stdlib which we're now stuck with, for better or worse".

Your entire reading of that is just wrong.

burntsushi 670 days ago [-]

You'll be able to do this soon with the Rust regex crate as well. Well, by using one of its dependencies. Once regex 1.9 is out, you'll be able to do this with regex-automata:

    use regex_automata::{
        meta::Regex,
        util::iter::Searcher,
        Anchored, Input,
    };
    
    #[derive(Clone, Copy, Debug)]
    enum Token {
        Integer,
        Identifier,
        Punctuation,
    }
    
    fn main() {
        let re = Regex::new_many(&[
            r"[0-9]+",
            r"[a-z_]+",
            r"[,.]+",
            r"\s+",
        ]).unwrap();
        let hay = "45 pigeons, 23 cows, 11 spiders";
        let input = Input::new(hay).anchored(Anchored::Yes);
        let mut it = Searcher::new(input).into_matches_iter(|input| {
            Ok(re.search(input))
        }).infallible();
        for m in &mut it {
            let token = match m.pattern().as_usize() {
                0 => Token::Integer,
                1 => Token::Identifier,
                2 => Token::Punctuation,
                3 => continue,
                pid => unreachable!("unrecognized pattern ID: {:?}", pid),
            };
            println!("{:?}: {:?}", token, &hay[m.range()]);
        }
        let remainder = &hay[it.input().get_span()];
        if !remainder.is_empty() {
            println!("did not consume entire haystack");
        }
    }

And its output:

    $ cargo run
       Compiling regex-scanner v0.1.0 (/home/andrew/tmp/scratch/regex-scanner)
        Finished dev [unoptimized + debuginfo] target(s) in 0.31s
         Running `target/debug/regex-scanner`
    Integer: "45"
    Identifier: "pigeons"
    Punctuation: ","
    Integer: "23"
    Identifier: "cows"
    Punctuation: ","
    Integer: "11"
    Identifier: "spiders"

A bit more verbose than the Python, but the library is exposing much lower level components. You have to do a little more stitching to get the `Scanner` behavior. But it does everything the Python does: a single scan (using finite automata and not backtracking like Python), skip certain token types and guarantees that the entirety of the haystack is consumed.

fastasucan 669 days ago [-]

'A bit more verbose' = 2,5 times as many characters. Not saying its good or bad just saying its a bit more verbose ;)

burntsushi 669 days ago [-]

Yes, as I said, the APIs exposed in regex-automata give a lot more power. It's an "expert" level crate. You could pretty easily build a scanner-like abstraction and get pretty close to the Python code.

I posted this because a lot of regex engines don't support this type of use case. Or don't support it well without having to give something up.

n8henrie 668 days ago [-]

Interesting! Not often using either crate, this example looks like something for which I might usually look to nom. Is there a reason I should consider using regex for this use case instead (if neither is a pre-existing dependency)?

burntsushi 667 days ago [-]

I don't use nom. I've tried using parser combinator libraries in the past but generally don't like them.

That said, I don't usually use regexes for this either. Instead, I just do things by hand.

So I'm probably not the right person to answer your question unfortunately. I just know that more than one person has asked for this style of use case to be supported in the regex crate. :-)

anentropic 670 days ago [-]

FWIW there is an example of building a tokenizer/scanner using documented features of the re module here: https://docs.python.org/3.11/library/re.html#writing-a-token...

re.Scanner looks more succinct though...!

czx4f4bd 670 days ago [-]

I find that even more bizarre tbh. Thst seems like the perfect place to document the Scanner class.

matsemann 670 days ago [-]

> completely missing from any official documentation

To be fair, most things are missing from the official documentation. When I learned kotlin, I read through their official docs, and knew about most language features in a day. When I learned python, I constantly got surprised by things I hadn't seen come up in the docs. For instance decorators was (still is?) not mentioned at all in the official tutorial.

roelschroeven 670 days ago [-]

The tutorial is not supposed to cover all language features: "This tutorial does not attempt to be comprehensive and cover every single feature, or even every commonly used feature. Instead, it introduces many of Python’s most noteworthy features, and will give you a good idea of the language’s flavor and style."

But then I don't know how you're supposed to learn the features that are not in the tutorial. You can have a look at the table of contents of the standard library documentation for modules that might interest you, but that doesn't cover language features. Those are documented in The Python Language Reference, but that document is not really suited for learning from.

There are lots of websites and Youtube channels and so on, but you have to find them, and filter out the not-so-good ones which is not easy, especially for a beginner. I think there is room for some kind of official advanced tutorial to cover the gap.

BoppreH 670 days ago [-]

Decorators seem to be documented now:

https://docs.python.org/3/glossary.html#term-decorator

https://docs.python.org/3/reference/compound_stmts.html#func...

bragr 670 days ago [-]

These have always been documented. Hit the version selector: https://docs.python.org/2.7/glossary.html#term-decorator

fastasucan 669 days ago [-]

I agree. Wish Python could been better with the documentation. Its a bit absurd that things feel more clear and simple reading Rust documentation than Python documentation for me, given that Python actually is a lot more simple and clear (for me).

emmelaich 670 days ago [-]

For a long time, MatchObject was missing too.

There now -- https://docs.python.org/3/library/re.html#match-objects

fastasucan 669 days ago [-]

Woah, thats a pretty cool feature! I allways feel a bit dirty trying to do anything like that manually (usually involving a string.split(",")[0][:2] etc, just asking to break).

anitil 670 days ago [-]

Amazing! I had just written a regex tokenizer in python the other day, this would have been great!

cricalix 670 days ago [-]

Curious cat - had you considered using Antlr4 and a Python visitor/listener? (Were you aware of antlr?). Depending what you're trying to do with a regex tokenizer, it might be suitable.

anitil 669 days ago [-]

It was more just for fun than anything else. I was more interested in the recursive descent parser so the tokenizer was just a step along the way.

I've played with Antlr but never really vibed with it to be honest (and it was years ago)

sam2426679 670 days ago [-]

I believe this is used internally by json.loads, which is not very surprising in hindsight.

carlossouza 670 days ago [-]

This is one of the best ChatGPT use cases: creating regex complex patterns

nmstoker 670 days ago [-]

It's excellent at both producing them and explaining what it has generated.

And they're now just regurgitated things from the web, I've had novel ones generated fine (obviously you need to test them carefully still)

extasia 670 days ago [-]

I agree. I made a little code formatter for aligning variable assignments on adjacent lines and used chatgpt for a lot of help with the regex

tl_donson 669 days ago [-]

this one use case justified copilot as an expense for me.

lizard 670 days ago [-]

Its notable, perhaps, that the

  if __name__ == "__main__":

block allows you to do this for a _module_, i.e. a single *.py file. If you want to do this for a package, add a `__main__.py`

You can also use either of these throughout your code so that you can have

  python -m foo
  python -m foo.bar
  python -m foo.bar.baz

each doing different, but hopefully somewhat related, things.

BiteCode_dev 670 days ago [-]

It also uses the file as a module, not a script, which means suddenly relative imports works the root dir and the cwd are the same, and it is added to sys.path.

This prevents a ton of import problems, albeit for the price of more verbose typing, especially since you don't have completion on dotted path.

It is my favorite way of running my projects.

Unfortunalty it means you can't use "-m pdb", and that's a big loss.

quietbritishjim 670 days ago [-]

> which means suddenly relative imports works

Not just relative imports but also (properly formed) absolute imports. For example, if you have a directory my_pkg/ with files mod1.py and mod2.py, then

    # In my_pkg/mod2.py
    import my_pkg.mod1

will work if you run `python -m my_pkg.mod2` but will fail if you run `python my_pkg/mod2.py`

However, the script syntax does work properly with absolute paths if if you set the enviornment variable `PYTHONPATH=.` (I don't know about relative paths - I don't use those). That would presumably allow pdb to work (but, shame on me, I've never tried it).

scruple 670 days ago [-]

I also develop and run projects this way. I really, really enjoy it. It's a very pleasant experience, on both the development side and execution side.

I'm relatively new to Python (used it for ~1 year in 2007/2008, again briefly in 2014 -- which is when I believe I picked this module trick up -- and then didn't touch it again until March of this year). It's made an impression on my team and we're all having a good time developing code this way. I do wonder, though, what other shortcomings might exist with this approach.

sharikous 670 days ago [-]

You can use pdb

Just

python -m pdb -m module

BiteCode_dev 670 days ago [-]

Damn, 15 years of python and I learn you can use -m twice. I've never even tried, didn't occur to me it would be supported.

EDIT: it support other options as well, like -c. That deserves an alias:

    debug_module() {
        if python -c "import ipdb" &>/dev/null; then
            python -m ipdb -c c -m "$@"
        else
            python -m pdb -c c -m  "$@"
        fi
    }

kzrdude 670 days ago [-]

There's no magic, only layers. `python -m pdb <args>` runs pdb with the rest of the arguments. pdb handles the second `-m`.

If you have a fancy IDE feature, open a new python file, type "import pdb", use go to definition on pdb to jump to that file in the standard library, and read its main function - it handles -m explicitly :)

sharikous 670 days ago [-]

Speaking of pdb.. maybe someone knows why pdb has some issues with scope in its REPL that are resolved in VSCode's debugger and PyCharm?

Multiline statements are not accepted, nor things like if/for

Even list comprehensions and lambda expressions have trouble loading local variables defined via the REPL

Are there workarounds? It would reduce the need for using IDEs. People who have experience with Julia and Matlab are very used to a trial and error programming style in a console and bare python does not address this need

johnnymellor 670 days ago [-]

Not supporting multi-line statements is just because pdb doesn't bother to parse the statement to work out if it is an incomplete multi-line statement. That could be easily fixed (I have a prototype patch for that using `code.compile_command`).

The scope problems are more fundamental:

The pdb REPL calls[1] the exec builtin as `exec(code, frame.f_globals, frame.f_locals)`, which https://docs.python.org/3/library/functions.html#exec documents as:

"If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition."

And https://docs.python.org/3/reference/executionmodel.html#reso... documents that:

"The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods - this includes comprehensions and generator expressions since they are implemented using a function scope."

This is a fundamental limitation of `exec`. You can workaround it by only passing a single namespace dictionary to exec instead of passing separate globals and locals, which is what pdb's interact command does[2], but then it's ambiguous how to map changes to that dictionary back to the separate globals and locals dictionaries (pdb's interact command just discards any changes you make to the namespace). This too could be solved, but requires either brittle ast parsing or probably a PEP to add new functionality to exec. I'll file a bug against Python soon.

[1]: https://github.com/python/cpython/blob/25a64fd28aaaaf2d21fae...

[2]: https://github.com/python/cpython/blob/25a64fd28aaaaf2d21fae...

johnnymellor 670 days ago [-]

Oh, and multi-line statements will be supported from Python 3.13 (https://github.com/python/cpython/issues/103124)

BiteCode_dev 670 days ago [-]

There are not workaround, pdb is a rustic tool.

Since I still enjoy a cmdline debugger more than a graphical one, I use ipdb, which doesn't suffer from the multiline limitation.

However, scoping issues with lambda and comprehension are actually a Python problem, not a pdb problem.

mejutoco 669 days ago [-]

After triggering pdb by having "breakpoint()" in tour python code and dropping in the debugger you can type "interactive" in the console to enter multiline strings.

BiteCode_dev 669 days ago [-]

TIL again. Very good thread!

It's "interact" though, not "ive".

You can exit it to carry on with the regular pdb.

mejutoco 667 days ago [-]

Thanks. I make that mistake every time I use it:)

misnome 670 days ago [-]

to be fair, it was only added in 3.7

abhishekjha 670 days ago [-]

Like whats the general usage though?

python -m http.server is the most I have done.

BiteCode_dev 670 days ago [-]

-m pdb is post mortem debugging, it drops you in the debugger when it encounters the first unhandled exception. This is much easier than trying to pin point where to put a breakpoint.

jxramos 669 days ago [-]

I've always been curious how that mechanism works exactly what is it about that invocation technique that satisfies the relative imports? I think it changes the pythonpath somehow right in a way related to the module being ran, something like appending the basedir where the module is saved to the PYTHONPATH?

andreareina 670 days ago [-]

I've taken to adding a --pdb option to my scripts.

maleldil 670 days ago [-]

What does it do? Just `import pdb; pdb.set_trace()`?

Or `breakpoint()` after 3.7 (better because the user can override pdb with ipdb or other with `PYTHONBREAKPOINT`)?

andreareina 670 days ago [-]

Either breakpoint() or setting sys.excepthook to call the postmortem debugger.

polyrand 670 days ago [-]

Python 3.12 will include a SQLite CLI/REPL in the standard library too[0][1]. This is useful because most operating systems have sqlite3 and python3, but are missing the SQLite CLI.

[0]: https://github.com/python/cpython/blob/3fb7c608e5764559a718c...

[1]: https://docs.python.org/3.12/library/sqlite3.html#command-li...

agumonkey 670 days ago [-]

slightly related, emacs also includes an sqlite client / view now.. I find it funny to see everybody chasing the same need, and unsurprising since.. it's always good to have sqlite close to you.

maleldil 670 days ago [-]

It's weird that they're adding code to the stdlib without type annotations.

RGBCube 670 days ago [-]

Lots of code in the stdlib does not have type annotations. Though I think most popular modules in the std either are annotated or have stubs somewhere.

maleldil 670 days ago [-]

I know, but I expected that to be restricted only to older modules. I don't expect them to annotate all existing modules.

I don't see why they would introduce _new_ code without annotating it, when that's clearly the trend for 3rd party libraries. From a quick look, it doesn't seem like it would be difficult to type either.

RGBCube 670 days ago [-]

Isn't the change for the CLI? That's not accessible from scripts, so that may be why.

Though, I agree that typing everything is good. Especially when combined with a good typechecker like pyright.

telotortium 670 days ago [-]

Maybe submit a patch?

extasia 670 days ago [-]

The standard library has type hints in the "typeshed" github repo. Please do not submit PRs to cpython to add type hints (I made this error before too:))

chmaynard 670 days ago [-]

> This is useful because most operating systems have sqlite3 and python3, but are missing the SQLite CLI.

Not sure what you mean -- sqlite3 is the SQLite CLI.

polyrand 670 days ago [-]

Yes, I meant the DLL/.so file. The library is almost always present in the OS, the CLI is not.

TeMPOraL 670 days ago [-]

They likely mean libsqlite3, i.e. the .DLL/.so.

chmaynard 670 days ago [-]

Not sure why my comment is being downvoted. From the sqlite.org website:

    The SQLite project provides a simple command-line program named sqlite3 (or sqlite3.exe on Windows)
    that allows the user to manually enter and execute SQL statements against an SQLite database
    or against a ZIP archive.

dfc 670 days ago [-]

I am not someone who complains about "stop piping cats" but using '-e' with grep is a lot quicker and easier to read. This:

    grep -v 'test/' | grep -v 'tests/' | grep -v idlelib | grep -v turtledemo

Becomes:

    grep -ve 'test/' -e 'tests/' -e idlelib -e turtledemo

yunohn 670 days ago [-]

To be fair, the advantage of doing separate greps is working iteratively and drilling down on what you want.

kevincox 669 days ago [-]

You can iteratively add the -e flags just the same.

yunohn 669 days ago [-]

Sure, for the basic example of 5 simple greps. Usually I have some additional flags or steps in between too.

knodi123 670 days ago [-]

not this? grep -ve 'test/|tests/|idlelib|turtledemo'

xorcist 670 days ago [-]

That would be -vE, or you have to escape the pipe symbols.

bdzr 669 days ago [-]

Couldn't it also be -vP?

simonw 670 days ago [-]

Thanks, that's a useful tip.

st0le 669 days ago [-]

Also since ripgrep is already installed "grep -v" can be replaced with "rg -v"

submeta 670 days ago [-]

Nice! I use this on winboxes:

zipfile

Decompress a zip file:

    python -m zipfile -e archive.zip /path/to/extract/to

Compress a directory into a zip file:

    python -m zipfile -c new_archive.zip /path/to/directory

670 days ago [-]

throwawaymobule 669 days ago [-]

is there an advantage to this over windows builtin zip feature? I don't know if it is usable from CLI.

CJefferson 670 days ago [-]

I use http.server all the time, particularly as modern browsers disable a bunch of functionality if you open file URLs. Had no idea there was so much other stuff here!

trotro 670 days ago [-]

Same here, it's also by far the most convenient way I've found to share files between devices on my network.

bradrn 670 days ago [-]

Wait, how does that work? As far as I can see from the documentation, it can only serve on localhost, which to my understanding is only accessible from the single device it was launched on.

ptx 670 days ago [-]

I think you must have misread the documentation [1], which says: "By default, the server binds itself to all interfaces."

[1] https://docs.python.org/3/library/http.server.html

bradrn 670 days ago [-]

Not ‘misread’, just ‘missed that bit entirely’. Testing it out, this is indeed the case.

randlet 670 days ago [-]

If you serve on localhost you can usually access from other devices by using the "servers" ip address. So if your desktop where you're running the server has ip 192.168.1.10 then you can go to http://192.168.1.10 in the browser of another device on the same network.

Karellen 670 days ago [-]

But `localhost` is also an alias specifically for the loopback address (typically `127.0.0.1`), so "serve on localhost" can reasonably be interpreted as "serve on 127.0.0.1" which will only be available to other programs on that host, and not to others devices on the local network.

bradrn 670 days ago [-]

Ohh, that makes complete sense… thanks for pointing this out!

codetrotter 670 days ago [-]

Also if your host and client devices both support mDNS / Bonjour, you don’t even need to type the IP address.

For example if your Ubuntu Linux desktop machine has host name foobar, and you run a http server on for example port 8000 then you can use your iPhone and other mDNS / Bonjour capable things to open

http://foobar.local:8000/

And likewise say you have a macOS laptop with hostname “amogus” and for example http server is listening on port 8081, you can navigate on other mDNS / Bonjour capable devices and machines to

http://amogus.local:8081/

vram22 670 days ago [-]

Have you checked the full docs? Maybe it takes an optional parameter to specify the server machine's IP address or host name. Then others on the network could see it.

Not near a PC now.

seanw444 670 days ago [-]

Same for me. Although I just discovered `qrcp` which has been quite handy, and I'm sure there are many tools like it.

LtWorf 670 days ago [-]

I wrote this, a while ago https://ltworf.github.io/weborf/qweborf.html

On plasma it also installs a right click shortcut to share directory from dolphin.

vs4vijay 670 days ago [-]

FYI: you can use https://file.pizza/ for sending the file outside the network.

pinkcan 670 days ago [-]

www is aliassed on my .zshrc for that reason:

www='python -m http.server'

icar 670 days ago [-]

I use miniserve (`cargo install miniserve`). You also have available `npx serve`.

HumblyTossed 670 days ago [-]

Oh and the Rust brigade have arrived... Was only a matter of time.

IshKebab 669 days ago [-]

How dare people mention tools written in Rust?

It may be a fantastic, well loved language that's exploding in popularity and the source of endless very high quality CLI tools... but. The absolute cheek!

We must only mention Python and Bash forevermore.

HumblyTossed 669 days ago [-]

Every time there's a discussion about language X, Rusties jump in. Stop it.

One of the biggest turn offs about Rust is the community.

IshKebab 668 days ago [-]

He literally did not mention Rust.

HumblyTossed 668 days ago [-]

Implicitly, they did.

madfucker69 670 days ago [-]

Hey, stop rust-shaming

thrdbndndn 670 days ago [-]

Chrome also often takes forever to load even the simple HTMLs if it's a local file.

I suspect it's related to relative path resource but never figured it out.

9dev 670 days ago [-]

> Pretty-print JSON: > echo '{"foo": "bar", "baz": [1, 2, 3]}' | python -m json.tool

This is even more fun on MacOS if you combine it with the pbpaste/pbcopy utils:

  alias json_pretty="pbpaste | python -m json.tool | pbcopy"

That command will pretty-print any JSON in your clipboard, and write it back to the clipboard, so you can paste it somewhere else formatted!

TRiG_Ireland 670 days ago [-]

    alias json-format="python -m json.tool | pygmentize -l javascript"

scruple 670 days ago [-]

I have a similar mapping to use `json.tool` inside of vim buffers. Very useful tool that's gotten a ton of mileage from me over the last ~decade.

mg 670 days ago [-]

There is one problem with those: Security.

Using modules (even if they are in the standard library) on the command line lets malicious code in the current dir take over your machine:

https://twitter.com/marekgibney/status/1598706464583028736

cxr 670 days ago [-]

I find the entire premise of the post to be pretty baffling.

> Seth pointed out this is useful if you are on Windows and don't have the gzip utility installed.

Okay, so instead of installing gzip (or just using the decompressors that aren't the official gzip utility but that do support the format and already ship with Windows by default[1]), you install Python...?

Even if the premise weren't muddy from the start, there is a language runtime and ubiquitous cross-platform API already available on all major desktops that has a really good, industrial strength sandbox/containerization strategy that more than adequately mitigates the security issues raised here so you can e.g. rot13 without fear all day to your delight: the browser.

1. <https://news.ycombinator.com/item?id=36099528>

jerpint 670 days ago [-]

Most people who read this blog already have python installed

cxr 670 days ago [-]

The bafflement increases.

The topic of this thread is the safety and security of running Python on (or merely next to) arbitrary content.

Even ignoring that: is "most people" greater than, less than, or the same amount as "all"?

totallywrong 670 days ago [-]

> The bafflement increases.

So does the pendantry.

cxr 669 days ago [-]

Please don't derail discussions with zero-effort, low-substance nonsequitur oneliners like this.

totallywrong 669 days ago [-]

Sorry you're right. But you are being pedantic. These are quick hacks that might come in handy a couple times per year, maybe. Bringing up security, alternative native tools, and even trying to find a formal definition for "most people" is, imo, missing the point.

cxr 669 days ago [-]

Okay, so what is the point (in terms more specific than "quick hacks that might come in handy a couple times per year", and for whom, exactly)?

669 days ago [-]

yunohn 670 days ago [-]

You can definitely be in situations where you have python but not gzip.

670 days ago [-]

formerly_proven 670 days ago [-]

iirc this is one of the things earmarked for a hypothetical Python 4, making -P the default. It's also one of the many relatively well-known (security) issues in Python that don't get addressed for a surprising amount of time. Others in the same vein would be stuff like stderr being block-buffered when not using a TTY, no randomized hashes for the longest time, loading DLLs preferably from the working directory, std* using 7-bit ASCII in a number of circumstances and many more.

patrec 670 days ago [-]

> stderr being block-buffered

As opposed to line buffered, I assume? That sounds annoying, but why is it a security problem?

> no randomized hashes

I'm not up to date, but I think last I looked, I had the impression that randomized hashes didn't seem like they would fundamentally prevent collision attacks, just require more sophistication. Is that not the case?

TeMPOraL 670 days ago [-]

> loading DLLs preferably from the working directory

That is a feature, though, isn't it?

formerly_proven 670 days ago [-]

No, the problem was that python[.exe] would load pythonXY.dll etc. from the working directory instead of the installation.

Edit: I also recall issues with wheels where .so's in unexpected locations would take preference over the .so files shipped by the wheel. I believe most of that should be fixed nowadays with auditwheel and hardcoded rpaths.

jonnycomputer 670 days ago [-]

-P?

throwawaymobule 669 days ago [-]

https://docs.python.org/3/using/cmdline.html?highlight=q#cmd...

had to look myself. apparantly, it's like setting PYTHONSAFEPATH which prevents 'unsafe' paths from getting added to sys.path. new in 3.11

jonnycomputer 669 days ago [-]

New in 3.11; ah; that's why I didn't see the option in my default python.

johannes1234321 670 days ago [-]

Doing that requires palcement of files right in the directory where the user is likely to run that module.

Seems to be a quite rare vector for exploitation.

Sure, on a multiuser system I might trick some other user into running such a command in /tmp and prepare that directory accordingly, but other vectors seem more esoteric.

mg 670 days ago [-]

There are thousands of tools (shellscripts, makefiles ...) which execute "python -m":

https://www.gnod.com/search/?engines=af&nw=1&q=python%20-m

Even in Google's own repos. Starting any of those (no matter where they are stored) in a hostile repo would let the code in the repo take over the machine.

johannes1234321 670 days ago [-]

If you run anything in a hostile repo, you already lost.

jonnycomputer 670 days ago [-]

If running with -m, or -c and an import of package matching name of malicious package. Doesn't happen when running your own script (located in another directory) that imports that package, even if you are running it in that directory.

670 days ago [-]

p4bl0 670 days ago [-]

I often use `python -m http.server` e.g. to easily share files over local networks but I had no idea so many standard modules supported this. Thanks for sharing this link!

tomrod 670 days ago [-]

Do you have an example snippet you could share?

quenix 670 days ago [-]

You just start an http server with

    python3 -m http.server 8080

and then access the server on the receiving device through its IP address. It’ll show a basic directory listing, and you can download from there.

tomrod 670 days ago [-]

Thanks! Very helpful.

jon-wood 670 days ago [-]

`python3 -m http.server --help` - its really not that complex.

tomrod 670 days ago [-]

True. Glad to hear it's not hidden under 10 args.

nneonneo 670 days ago [-]

I answered a question on SO over a decade ago on this topic, back in the Python 2.7 era: https://stackoverflow.com/a/14545364/1204143

One of these days it would be nice to make an unofficial Python reference book which documents these tools, hidden features (like re.Scanner!), and other corners of the stdlib or language.

Noumenon72 670 days ago [-]

python -m json.tool seems neat. I am always dumping JSON in single log lines and then expanding it to diff it (currently using Cmd+Opt+L in PyCharm).

dmurray 670 days ago [-]

I tend to use "jq .".

An extra tool, but you don't really have to learn it, and if you're "always dumping JSON" and also use the command line a lot, you probably want to have it around anyway.

OJFord 670 days ago [-]

You don't need the `.` fwiw, just `produce-compact-json | jq` will do.

rich_sasha 670 days ago [-]

Not always, some versions of jq get confused without the dot. I work on two Linuces at work, one requires it and one doesn't.

OJFord 670 days ago [-]

The current version and for quite some time, then.

If I said 'Windows comes with the Edge browser' would you say 'I use two at work, one does but the other only has Internet Explorer'? Surely it's generally implied we're talking about things as they are, unless specified otherwise?

cl3misch 670 days ago [-]

FYI, you can also just pipe into "jq". If anything it's faster to type :-)

masklinn 670 days ago [-]

Shame gzip has one but zlib does not, that would be a very useful addition: some software create raw zlib streams on disk (e.g. git) and there’s no standard decompressor, you need to either prepend a fake gzip header, go through openssl, qpdf‘s zlib-flate, or pigz -z.

masklinn 670 days ago [-]

After looking into it, turns out gzip is a python module while zlib is a native (C) module. And I can find no hook to support `-m` with native modules.

JoBrad 669 days ago [-]

You can combine -m and -c to use zlib

gabrielsroka 670 days ago [-]

This seems to work too

https://www.google.com/search?q=site%3Adocs.python.org+%22py...

NoboruWataya 670 days ago [-]

> I thought this might provide a utility for generating random numbers, but sadly it's just a benchmarking suite with no additional command-line options:

I had the same experience with the tkinter ones - I thought they might be like zenity, a way to build simple UI elements from the command line. But they mostly just show simple non-configurable test widgets. The colour chooser could be helpful though.

TelonAlex 670 days ago [-]

The cli tool [fire](https://github.com/google/python-fire/blob/master/docs/guide...) has a nifty feature where it can generate a cli for any file for you.

So random and math are somewhat usable that way

    $ python -m fire random uniform 0 1
    0.5502786602920726

    $ python -m fire math radians 180
    3.141592653589793

    $ python -m fire math e
    2.718281828459045

Just running

    $ python -m fire random

will give you a nice "manpage" for your module as well.

mr_o47 670 days ago [-]

Thank you for putting this together and this is pretty useful as my favorite has been running http.server

codetrotter 670 days ago [-]

My favourite is

    python3 -m http.server

I use it all the time

jerpint 670 days ago [-]

Somewhat related is the ipython repl which can replace a shell and provides many useful commands as well as execute python code

SushiHippie 667 days ago [-]

`python -m turtledemo` may be not a hidden “tool”, but nonetheless it is absolutely mint.

sproketboy 670 days ago [-]

[dead]

ozymandias_kok 670 days ago [-]

Oooh vulnerabilities! Yum

nbrtx 670 days ago [-]

Why would you rely on any of these? They can be deprecated at a whim like distutils and many other things.

That's why Python requires so many blog posts.

Macha 670 days ago [-]

If by on a whim, you mean after removal after 6 years of being warned against (the first "maybe use setuptools instead" note was in Python 2.7.12 in 2016), deprecation was proposed in october 2020, agreed in january 2021, and removal will happen in Python 3.12, which... hasn't been released yet.

Loading comments...

BoppreH 670 days ago [-]

Speaking of hidden Python tools, I'm a big fan of re.Scanner[0]. It's a regex-based tokenizer[1] in the `re` module, that for reasons is completely missing from any official documentation.

You give it a pattern for each token type, and a function to be called on each match, and you get back a list of processed tokens.

    import re
    scanner = re.Scanner([
      (r"[0-9]+",       lambda scanner, token:("INTEGER", int(token))),
      (r"[a-z_]+",      lambda scanner, token:("IDENTIFIER", token)),
      (r"[,.]+",        lambda scanner, token:("PUNCTUATION", token)),
      (r"\s+", None), # None == skip token.
    ])

    results, remainder = scanner.scan("45 pigeons, 23 cows, 11 spiders.")
    assert not remainder
    print(results)

    [('INTEGER', 45),
     ('IDENTIFIER', 'pigeons'),
     ('PUNCTUATION', ','),
     ('INTEGER', 23),
     ('IDENTIFIER', 'cows'),
     ('PUNCTUATION', ','),
     ('INTEGER', 11),
     ('IDENTIFIER', 'spiders'),
     ('PUNCTUATION', '.')]

[0]: https://stackoverflow.com/a/693818/252218

[1]: https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization

Timon3 670 days ago [-]

It seems like the discussion of whether to document it died in April of 2003: https://mail.python.org/pipermail/python-dev/2003-April/0350...

Bummer, it's a cool feature, but I don't feel safe relying on undocumented features.

bombolo 670 days ago [-]

Especially now that they are on a crusade against their own stdlib.

They regret including most modules… it seems they regret making python altogether instead of sticking with C? :D

Znafon 670 days ago [-]

There is some discussion about the rationale at https://peps.python.org/pep-0594/#rationale

bombolo 669 days ago [-]

Did you read the latest news or not? They regret asyncio! (and many other core modules).

They won't remove them… but they regret having made them.

I think without, people would have just not used python.

Redoubts 669 days ago [-]

After using Trio, I would regret asyncio too

vamega 669 days ago [-]

Where was this news? I’d like to read this.

kzrdude 670 days ago [-]

This is a typical programmer, you'll regret code as soon as you turn your back on it. :)

NegativeLatency 670 days ago [-]

Or before!

Qem 670 days ago [-]

> They regret including most modules… it seems they regret making python altogether instead of sticking with C? :D

oefrha 670 days ago [-]

You're reading a ridiculous mischaracterization of reality (~20 deprecated modules [1] out of the ~300 PSL modules [2] is "most"?). Of course you find it odd.

[1] https://peps.python.org/pep-0594/

[2] https://docs.python.org/3/library/

toyg 670 days ago [-]

eesmith 670 days ago [-]

"Trend"? Python has dropped quite a few modules over the decades.

"gl", "sgi", "fl", "sunaudiodev", "audioop", "stdwin", "rotor", "poly", "whatsound", "gopherlib" .. the list goes on.

dragonwriter 670 days ago [-]

There are two strong reasons for that:

(1) Not letting little used modules that pose maintenance problems drive up the cost of maintaining Python, and

(2) Not forcing actively used, actively developed modules to be limited to the core language upgrade cadence (and not forcing users to upgrade the language to get upgrades to the modules.)

oefrha 670 days ago [-]

bombolo 669 days ago [-]

I was referring to this: https://pyfound.blogspot.com/2023/05/the-python-language-sum...

Of course nobody even knew so the default response is to hit that bottom arrow :D

arp242 669 days ago [-]

Your entire reading of that is just wrong.

burntsushi 670 days ago [-]

You'll be able to do this soon with the Rust regex crate as well. Well, by using one of its dependencies. Once regex 1.9 is out, you'll be able to do this with regex-automata:

    use regex_automata::{
        meta::Regex,
        util::iter::Searcher,
        Anchored, Input,
    };
    
    #[derive(Clone, Copy, Debug)]
    enum Token {
        Integer,
        Identifier,
        Punctuation,
    }
    
    fn main() {
        let re = Regex::new_many(&[
            r"[0-9]+",
            r"[a-z_]+",
            r"[,.]+",
            r"\s+",
        ]).unwrap();
        let hay = "45 pigeons, 23 cows, 11 spiders";
        let input = Input::new(hay).anchored(Anchored::Yes);
        let mut it = Searcher::new(input).into_matches_iter(|input| {
            Ok(re.search(input))
        }).infallible();
        for m in &mut it {
            let token = match m.pattern().as_usize() {
                0 => Token::Integer,
                1 => Token::Identifier,
                2 => Token::Punctuation,
                3 => continue,
                pid => unreachable!("unrecognized pattern ID: {:?}", pid),
            };
            println!("{:?}: {:?}", token, &hay[m.range()]);
        }
        let remainder = &hay[it.input().get_span()];
        if !remainder.is_empty() {
            println!("did not consume entire haystack");
        }
    }

And its output:

    $ cargo run
       Compiling regex-scanner v0.1.0 (/home/andrew/tmp/scratch/regex-scanner)
        Finished dev [unoptimized + debuginfo] target(s) in 0.31s
         Running `target/debug/regex-scanner`
    Integer: "45"
    Identifier: "pigeons"
    Punctuation: ","
    Integer: "23"
    Identifier: "cows"
    Punctuation: ","
    Integer: "11"
    Identifier: "spiders"

fastasucan 669 days ago [-]

'A bit more verbose' = 2,5 times as many characters. Not saying its good or bad just saying its a bit more verbose ;)

burntsushi 669 days ago [-]

I posted this because a lot of regex engines don't support this type of use case. Or don't support it well without having to give something up.

n8henrie 668 days ago [-]

burntsushi 667 days ago [-]

I don't use nom. I've tried using parser combinator libraries in the past but generally don't like them.

That said, I don't usually use regexes for this either. Instead, I just do things by hand.

So I'm probably not the right person to answer your question unfortunately. I just know that more than one person has asked for this style of use case to be supported in the regex crate. :-)

anentropic 670 days ago [-]

FWIW there is an example of building a tokenizer/scanner using documented features of the re module here: https://docs.python.org/3.11/library/re.html#writing-a-token...

re.Scanner looks more succinct though...!

czx4f4bd 670 days ago [-]

I find that even more bizarre tbh. Thst seems like the perfect place to document the Scanner class.

matsemann 670 days ago [-]

> completely missing from any official documentation

roelschroeven 670 days ago [-]

BoppreH 670 days ago [-]

Decorators seem to be documented now:

https://docs.python.org/3/glossary.html#term-decorator

https://docs.python.org/3/reference/compound_stmts.html#func...

bragr 670 days ago [-]

These have always been documented. Hit the version selector: https://docs.python.org/2.7/glossary.html#term-decorator

fastasucan 669 days ago [-]

emmelaich 670 days ago [-]

For a long time, MatchObject was missing too.

There now -- https://docs.python.org/3/library/re.html#match-objects

fastasucan 669 days ago [-]

Woah, thats a pretty cool feature! I allways feel a bit dirty trying to do anything like that manually (usually involving a string.split(",")[0][:2] etc, just asking to break).

anitil 670 days ago [-]

Amazing! I had just written a regex tokenizer in python the other day, this would have been great!

cricalix 670 days ago [-]

Curious cat - had you considered using Antlr4 and a Python visitor/listener? (Were you aware of antlr?). Depending what you're trying to do with a regex tokenizer, it might be suitable.

anitil 669 days ago [-]

It was more just for fun than anything else. I was more interested in the recursive descent parser so the tokenizer was just a step along the way.

I've played with Antlr but never really vibed with it to be honest (and it was years ago)

sam2426679 670 days ago [-]

I believe this is used internally by json.loads, which is not very surprising in hindsight.

carlossouza 670 days ago [-]

This is one of the best ChatGPT use cases: creating regex complex patterns

nmstoker 670 days ago [-]

It's excellent at both producing them and explaining what it has generated.

And they're now just regurgitated things from the web, I've had novel ones generated fine (obviously you need to test them carefully still)

extasia 670 days ago [-]

I agree. I made a little code formatter for aligning variable assignments on adjacent lines and used chatgpt for a lot of help with the regex

tl_donson 669 days ago [-]

this one use case justified copilot as an expense for me.

lizard 670 days ago [-]

Its notable, perhaps, that the

  if __name__ == "__main__":

block allows you to do this for a _module_, i.e. a single *.py file. If you want to do this for a package, add a `__main__.py`

You can also use either of these throughout your code so that you can have

  python -m foo
  python -m foo.bar
  python -m foo.bar.baz

each doing different, but hopefully somewhat related, things.

BiteCode_dev 670 days ago [-]

It also uses the file as a module, not a script, which means suddenly relative imports works the root dir and the cwd are the same, and it is added to sys.path.

This prevents a ton of import problems, albeit for the price of more verbose typing, especially since you don't have completion on dotted path.

It is my favorite way of running my projects.

Unfortunalty it means you can't use "-m pdb", and that's a big loss.

quietbritishjim 670 days ago [-]

> which means suddenly relative imports works

Not just relative imports but also (properly formed) absolute imports. For example, if you have a directory my_pkg/ with files mod1.py and mod2.py, then

    # In my_pkg/mod2.py
    import my_pkg.mod1

will work if you run `python -m my_pkg.mod2` but will fail if you run `python my_pkg/mod2.py`

scruple 670 days ago [-]

I also develop and run projects this way. I really, really enjoy it. It's a very pleasant experience, on both the development side and execution side.

sharikous 670 days ago [-]

You can use pdb

Just

python -m pdb -m module

BiteCode_dev 670 days ago [-]

Damn, 15 years of python and I learn you can use -m twice. I've never even tried, didn't occur to me it would be supported.

EDIT: it support other options as well, like -c. That deserves an alias:

    debug_module() {
        if python -c "import ipdb" &>/dev/null; then
            python -m ipdb -c c -m "$@"
        else
            python -m pdb -c c -m  "$@"
        fi
    }

kzrdude 670 days ago [-]

There's no magic, only layers. `python -m pdb <args>` runs pdb with the rest of the arguments. pdb handles the second `-m`.

sharikous 670 days ago [-]

Speaking of pdb.. maybe someone knows why pdb has some issues with scope in its REPL that are resolved in VSCode's debugger and PyCharm?

Multiline statements are not accepted, nor things like if/for

Even list comprehensions and lambda expressions have trouble loading local variables defined via the REPL

johnnymellor 670 days ago [-]

The scope problems are more fundamental:

The pdb REPL calls[1] the exec builtin as `exec(code, frame.f_globals, frame.f_locals)`, which https://docs.python.org/3/library/functions.html#exec documents as:

"If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition."

And https://docs.python.org/3/reference/executionmodel.html#reso... documents that:

[1]: https://github.com/python/cpython/blob/25a64fd28aaaaf2d21fae...

[2]: https://github.com/python/cpython/blob/25a64fd28aaaaf2d21fae...

johnnymellor 670 days ago [-]

Oh, and multi-line statements will be supported from Python 3.13 (https://github.com/python/cpython/issues/103124)

BiteCode_dev 670 days ago [-]

There are not workaround, pdb is a rustic tool.

Since I still enjoy a cmdline debugger more than a graphical one, I use ipdb, which doesn't suffer from the multiline limitation.

However, scoping issues with lambda and comprehension are actually a Python problem, not a pdb problem.

mejutoco 669 days ago [-]

After triggering pdb by having "breakpoint()" in tour python code and dropping in the debugger you can type "interactive" in the console to enter multiline strings.

BiteCode_dev 669 days ago [-]

TIL again. Very good thread!

It's "interact" though, not "ive".

You can exit it to carry on with the regular pdb.

mejutoco 667 days ago [-]

Thanks. I make that mistake every time I use it:)

misnome 670 days ago [-]

to be fair, it was only added in 3.7

abhishekjha 670 days ago [-]

Like whats the general usage though?

python -m http.server is the most I have done.

BiteCode_dev 670 days ago [-]

-m pdb is post mortem debugging, it drops you in the debugger when it encounters the first unhandled exception. This is much easier than trying to pin point where to put a breakpoint.

jxramos 669 days ago [-]

andreareina 670 days ago [-]

I've taken to adding a --pdb option to my scripts.

maleldil 670 days ago [-]

What does it do? Just `import pdb; pdb.set_trace()`?

Or `breakpoint()` after 3.7 (better because the user can override pdb with ipdb or other with `PYTHONBREAKPOINT`)?

andreareina 670 days ago [-]

Either breakpoint() or setting sys.excepthook to call the postmortem debugger.

polyrand 670 days ago [-]

Python 3.12 will include a SQLite CLI/REPL in the standard library too[0][1]. This is useful because most operating systems have sqlite3 and python3, but are missing the SQLite CLI.

[0]: https://github.com/python/cpython/blob/3fb7c608e5764559a718c...

[1]: https://docs.python.org/3.12/library/sqlite3.html#command-li...

agumonkey 670 days ago [-]

slightly related, emacs also includes an sqlite client / view now.. I find it funny to see everybody chasing the same need, and unsurprising since.. it's always good to have sqlite close to you.

maleldil 670 days ago [-]

It's weird that they're adding code to the stdlib without type annotations.

RGBCube 670 days ago [-]

Lots of code in the stdlib does not have type annotations. Though I think most popular modules in the std either are annotated or have stubs somewhere.

maleldil 670 days ago [-]

I know, but I expected that to be restricted only to older modules. I don't expect them to annotate all existing modules.

RGBCube 670 days ago [-]

Isn't the change for the CLI? That's not accessible from scripts, so that may be why.

Though, I agree that typing everything is good. Especially when combined with a good typechecker like pyright.

telotortium 670 days ago [-]

Maybe submit a patch?

extasia 670 days ago [-]

The standard library has type hints in the "typeshed" github repo. Please do not submit PRs to cpython to add type hints (I made this error before too:))

chmaynard 670 days ago [-]

> This is useful because most operating systems have sqlite3 and python3, but are missing the SQLite CLI.

Not sure what you mean -- sqlite3 is the SQLite CLI.

polyrand 670 days ago [-]

Yes, I meant the DLL/.so file. The library is almost always present in the OS, the CLI is not.

TeMPOraL 670 days ago [-]

They likely mean libsqlite3, i.e. the .DLL/.so.

chmaynard 670 days ago [-]

Not sure why my comment is being downvoted. From the sqlite.org website:

    The SQLite project provides a simple command-line program named sqlite3 (or sqlite3.exe on Windows)
    that allows the user to manually enter and execute SQL statements against an SQLite database
    or against a ZIP archive.

dfc 670 days ago [-]

I am not someone who complains about "stop piping cats" but using '-e' with grep is a lot quicker and easier to read. This:

    grep -v 'test/' | grep -v 'tests/' | grep -v idlelib | grep -v turtledemo

Becomes:

    grep -ve 'test/' -e 'tests/' -e idlelib -e turtledemo

yunohn 670 days ago [-]

To be fair, the advantage of doing separate greps is working iteratively and drilling down on what you want.

kevincox 669 days ago [-]

You can iteratively add the -e flags just the same.

yunohn 669 days ago [-]

Sure, for the basic example of 5 simple greps. Usually I have some additional flags or steps in between too.

knodi123 670 days ago [-]

not this? grep -ve 'test/|tests/|idlelib|turtledemo'

xorcist 670 days ago [-]

That would be -vE, or you have to escape the pipe symbols.

bdzr 669 days ago [-]

Couldn't it also be -vP?

simonw 670 days ago [-]

Thanks, that's a useful tip.

st0le 669 days ago [-]

Also since ripgrep is already installed "grep -v" can be replaced with "rg -v"

submeta 670 days ago [-]

Nice! I use this on winboxes:

zipfile

Decompress a zip file:

    python -m zipfile -e archive.zip /path/to/extract/to

Compress a directory into a zip file:

    python -m zipfile -c new_archive.zip /path/to/directory

670 days ago [-]

throwawaymobule 669 days ago [-]

is there an advantage to this over windows builtin zip feature? I don't know if it is usable from CLI.

CJefferson 670 days ago [-]

I use http.server all the time, particularly as modern browsers disable a bunch of functionality if you open file URLs. Had no idea there was so much other stuff here!

trotro 670 days ago [-]

Same here, it's also by far the most convenient way I've found to share files between devices on my network.

bradrn 670 days ago [-]

Wait, how does that work? As far as I can see from the documentation, it can only serve on localhost, which to my understanding is only accessible from the single device it was launched on.

ptx 670 days ago [-]

I think you must have misread the documentation [1], which says: "By default, the server binds itself to all interfaces."

[1] https://docs.python.org/3/library/http.server.html

bradrn 670 days ago [-]

Not ‘misread’, just ‘missed that bit entirely’. Testing it out, this is indeed the case.

randlet 670 days ago [-]

Karellen 670 days ago [-]

bradrn 670 days ago [-]

Ohh, that makes complete sense… thanks for pointing this out!

codetrotter 670 days ago [-]

Also if your host and client devices both support mDNS / Bonjour, you don’t even need to type the IP address.

For example if your Ubuntu Linux desktop machine has host name foobar, and you run a http server on for example port 8000 then you can use your iPhone and other mDNS / Bonjour capable things to open

http://foobar.local:8000/

And likewise say you have a macOS laptop with hostname “amogus” and for example http server is listening on port 8081, you can navigate on other mDNS / Bonjour capable devices and machines to

http://amogus.local:8081/

vram22 670 days ago [-]

Have you checked the full docs? Maybe it takes an optional parameter to specify the server machine's IP address or host name. Then others on the network could see it.

Not near a PC now.

seanw444 670 days ago [-]

Same for me. Although I just discovered `qrcp` which has been quite handy, and I'm sure there are many tools like it.

LtWorf 670 days ago [-]

I wrote this, a while ago https://ltworf.github.io/weborf/qweborf.html

On plasma it also installs a right click shortcut to share directory from dolphin.

vs4vijay 670 days ago [-]

FYI: you can use https://file.pizza/ for sending the file outside the network.

pinkcan 670 days ago [-]

www is aliassed on my .zshrc for that reason:

www='python -m http.server'

icar 670 days ago [-]

I use miniserve (`cargo install miniserve`). You also have available `npx serve`.

HumblyTossed 670 days ago [-]

Oh and the Rust brigade have arrived... Was only a matter of time.

IshKebab 669 days ago [-]

How dare people mention tools written in Rust?

It may be a fantastic, well loved language that's exploding in popularity and the source of endless very high quality CLI tools... but. The absolute cheek!

We must only mention Python and Bash forevermore.

HumblyTossed 669 days ago [-]

Every time there's a discussion about language X, Rusties jump in. Stop it.

One of the biggest turn offs about Rust is the community.

IshKebab 668 days ago [-]

He literally did not mention Rust.

HumblyTossed 668 days ago [-]

Implicitly, they did.

madfucker69 670 days ago [-]

Hey, stop rust-shaming

thrdbndndn 670 days ago [-]

Chrome also often takes forever to load even the simple HTMLs if it's a local file.

I suspect it's related to relative path resource but never figured it out.

9dev 670 days ago [-]

> Pretty-print JSON: > echo '{"foo": "bar", "baz": [1, 2, 3]}' | python -m json.tool

This is even more fun on MacOS if you combine it with the pbpaste/pbcopy utils:

  alias json_pretty="pbpaste | python -m json.tool | pbcopy"

That command will pretty-print any JSON in your clipboard, and write it back to the clipboard, so you can paste it somewhere else formatted!

TRiG_Ireland 670 days ago [-]

    alias json-format="python -m json.tool | pygmentize -l javascript"

scruple 670 days ago [-]

I have a similar mapping to use `json.tool` inside of vim buffers. Very useful tool that's gotten a ton of mileage from me over the last ~decade.

mg 670 days ago [-]

There is one problem with those: Security.

Using modules (even if they are in the standard library) on the command line lets malicious code in the current dir take over your machine:

https://twitter.com/marekgibney/status/1598706464583028736

cxr 670 days ago [-]

I find the entire premise of the post to be pretty baffling.

> Seth pointed out this is useful if you are on Windows and don't have the gzip utility installed.

1. <https://news.ycombinator.com/item?id=36099528>

jerpint 670 days ago [-]

Most people who read this blog already have python installed

cxr 670 days ago [-]

The bafflement increases.

The topic of this thread is the safety and security of running Python on (or merely next to) arbitrary content.

Even ignoring that: is "most people" greater than, less than, or the same amount as "all"?

totallywrong 670 days ago [-]

> The bafflement increases.

So does the pendantry.

cxr 669 days ago [-]

Please don't derail discussions with zero-effort, low-substance nonsequitur oneliners like this.

totallywrong 669 days ago [-]

cxr 669 days ago [-]

Okay, so what is the point (in terms more specific than "quick hacks that might come in handy a couple times per year", and for whom, exactly)?

669 days ago [-]

yunohn 670 days ago [-]

You can definitely be in situations where you have python but not gzip.

670 days ago [-]

formerly_proven 670 days ago [-]

patrec 670 days ago [-]

> stderr being block-buffered

As opposed to line buffered, I assume? That sounds annoying, but why is it a security problem?

> no randomized hashes

TeMPOraL 670 days ago [-]

> loading DLLs preferably from the working directory

That is a feature, though, isn't it?

formerly_proven 670 days ago [-]

No, the problem was that python[.exe] would load pythonXY.dll etc. from the working directory instead of the installation.

jonnycomputer 670 days ago [-]

-P?

throwawaymobule 669 days ago [-]

https://docs.python.org/3/using/cmdline.html?highlight=q#cmd...

had to look myself. apparantly, it's like setting PYTHONSAFEPATH which prevents 'unsafe' paths from getting added to sys.path. new in 3.11

jonnycomputer 669 days ago [-]

New in 3.11; ah; that's why I didn't see the option in my default python.

johannes1234321 670 days ago [-]

Doing that requires palcement of files right in the directory where the user is likely to run that module.

Seems to be a quite rare vector for exploitation.

Sure, on a multiuser system I might trick some other user into running such a command in /tmp and prepare that directory accordingly, but other vectors seem more esoteric.

mg 670 days ago [-]

There are thousands of tools (shellscripts, makefiles ...) which execute "python -m":

https://www.gnod.com/search/?engines=af&nw=1&q=python%20-m

Even in Google's own repos. Starting any of those (no matter where they are stored) in a hostile repo would let the code in the repo take over the machine.

johannes1234321 670 days ago [-]

If you run anything in a hostile repo, you already lost.

jonnycomputer 670 days ago [-]

670 days ago [-]

p4bl0 670 days ago [-]

I often use `python -m http.server` e.g. to easily share files over local networks but I had no idea so many standard modules supported this. Thanks for sharing this link!

tomrod 670 days ago [-]

Do you have an example snippet you could share?

quenix 670 days ago [-]

You just start an http server with

    python3 -m http.server 8080

and then access the server on the receiving device through its IP address. It’ll show a basic directory listing, and you can download from there.

tomrod 670 days ago [-]

Thanks! Very helpful.

jon-wood 670 days ago [-]

`python3 -m http.server --help` - its really not that complex.

tomrod 670 days ago [-]

True. Glad to hear it's not hidden under 10 args.

nneonneo 670 days ago [-]

I answered a question on SO over a decade ago on this topic, back in the Python 2.7 era: https://stackoverflow.com/a/14545364/1204143

One of these days it would be nice to make an unofficial Python reference book which documents these tools, hidden features (like re.Scanner!), and other corners of the stdlib or language.

Noumenon72 670 days ago [-]

python -m json.tool seems neat. I am always dumping JSON in single log lines and then expanding it to diff it (currently using Cmd+Opt+L in PyCharm).

dmurray 670 days ago [-]

I tend to use "jq .".

An extra tool, but you don't really have to learn it, and if you're "always dumping JSON" and also use the command line a lot, you probably want to have it around anyway.

OJFord 670 days ago [-]

You don't need the `.` fwiw, just `produce-compact-json | jq` will do.

rich_sasha 670 days ago [-]

Not always, some versions of jq get confused without the dot. I work on two Linuces at work, one requires it and one doesn't.

OJFord 670 days ago [-]

The current version and for quite some time, then.

cl3misch 670 days ago [-]

FYI, you can also just pipe into "jq". If anything it's faster to type :-)

masklinn 670 days ago [-]

After looking into it, turns out gzip is a python module while zlib is a native (C) module. And I can find no hook to support `-m` with native modules.

JoBrad 669 days ago [-]

You can combine -m and -c to use zlib

gabrielsroka 670 days ago [-]

This seems to work too

https://www.google.com/search?q=site%3Adocs.python.org+%22py...

NoboruWataya 670 days ago [-]

> I thought this might provide a utility for generating random numbers, but sadly it's just a benchmarking suite with no additional command-line options:

TelonAlex 670 days ago [-]

The cli tool [fire](https://github.com/google/python-fire/blob/master/docs/guide...) has a nifty feature where it can generate a cli for any file for you.

So random and math are somewhat usable that way

    $ python -m fire random uniform 0 1
    0.5502786602920726

    $ python -m fire math radians 180
    3.141592653589793

    $ python -m fire math e
    2.718281828459045

Just running

    $ python -m fire random

will give you a nice "manpage" for your module as well.

mr_o47 670 days ago [-]

Thank you for putting this together and this is pretty useful as my favorite has been running http.server

codetrotter 670 days ago [-]

My favourite is

    python3 -m http.server

I use it all the time

jerpint 670 days ago [-]

Somewhat related is the ipython repl which can replace a shell and provides many useful commands as well as execute python code

SushiHippie 667 days ago [-]

`python -m turtledemo` may be not a hidden “tool”, but nonetheless it is absolutely mint.

sproketboy 670 days ago [-]

[dead]

ozymandias_kok 670 days ago [-]

Oooh vulnerabilities! Yum

nbrtx 670 days ago [-]

Why would you rely on any of these? They can be deprecated at a whim like distutils and many other things.

That's why Python requires so many blog posts.

Macha 670 days ago [-]