I work with a bunch of 'data scientists' / 'strategists' and the like who love their notebooks but it's a pain to convert their code into an application!
In particular:
* Notebooks store code and data together, which is very messy if you want to look at [only] code history in git.
* It's hard to turn a notebook into an assertive test.
* Converting a notebook function into a python module basically involves cutting and pasting from the notebook into a .py file.
These must be common issues for anyone working in this area. Are there any guides on best practices for bridging from notebooks to applications?
Ideally I'd want to build a python application that's managed via git, but some modules / functions are lifted exactly from notebooks.
> Are there any guides on best practices for bridging from notebooks to applications?
The main point of friction is that the "default" format for storing notebooks is not valid, human readable python code, but an unreadable json mess. The situation would be much better if a notebook was stored as a python file, with code cells verbatim, and markdown cells inside python comments with appropriate line breaking. That way, you could run and edit notebooks from outside the browser, and let git track them easily. Ah, what a nice world would that be.
But this is exactly the world we already live in, thanks to jupytext!
Or you could do what I do, and write the report as specially marked comments in the actual code, which can be grepped out later to create a valid markdown document.
Pipe into pandoc, prepend some css, optionally a mathjax header, done. Beautiful reports.
Honestly I've yet to be convinced there's good reason for anything more than this.
Yes, I use a very similar setup with a three-line makefile to test and build. But the OP wanted to use the in-browser notebook interface, and this is srill possible via jupytext (while allowing collaboration with out-of-browser users).
1) This is painful. There are tools to help, but the most effective means I've found are having a policy to only commit notebooks in a reset, clean state (enforced with githook).
2) I don't understand. I've written full testing frameworks for applications as notebooks as a means of having code documentation that enforced/tested the non-programmatic statements in the document. Using tools like papermill (https://papermill.readthedocs.io/en/latest/), you can easily write a unit test as a notebook with a whole host of documentation around what it's doing, execute, and inspect the result (failed execution vs. final state of the notebook vs. whatever you want)
3) Projects like ipynb (https://ipynb.readthedocs.io/en/stable/) allow you to import notebooks as if they were python modules. Some projects have different opinions of what that means to match different use cases. Papermill allows you have an interface with a notebook that is more like a system call than importing a module. I've personally used papermill and ipynb and found both enjoyable for different flavors of blending applications and notebooks.
This problem is one reason why I'm a little mystified by Juypter's widespread adoption. It's got a lot of neat features but the Rstudio/Rmarkdown combo solves the above problem, and for me at least, that's decisive. As a tradeoff, you deal with an IDE that, in a bunch of ways, adds friction to writing Python code; but I gather that the Rstudio team is working on that (https://www.rstudio.com/solutions/r-and-python/). Not trying to start a flamewar here, I actually just don't get why Jupyter has become the default.
(Caveat that Jupyter is way better with e.g. Julia, in my (limited) experience)
For R&D the feedback loops are much tighter for sketching an algorithm line by line in Jupyter vs a Python file. Error in the 20th function? Ok fine then I’ll just change the cell it’s defined in and continue from the state of after the 19th. If I forget the layout or type of an object, just inspect it right there in a new cell.
Especially if it deals with multimedia, can just blit images or audio or HTML applications inline.
And it’s fairly trivial to go from Jupyter Notebook -> Python file once you’re done.
Specifically I think they were comparing rmarkdown vs jupyter. And it's really no contest, all the things people hate about jupyter are solved by rmarkdown (and org mode, but that's a harder sell)
The problem with RStudio is that it uses R, which while excellent at numerical calculations, is terrible at everything else - data parsing, string munging, file processing, ...
As the joke goes: The best thing about R is that it's designed by statisticians. The worst thing about R is that it's designed by statisticians.
specifically "data parsing", "string munging", and "file processing"?
I've used R extensively for all of these, and having recently re-visited the python world don't see any advantage that Python has over R for any of these tasks.
My wife has been learning Python (not a programmer) and now is looking at R. I thought she was going to like it as I personally think RStudio is nice. I was surprised she didn't like Rmarkdown after being exposed to Python notebooks, in particular she loved vscode + notebooks and immediate feedback and didn't like at all not having the markdown in RStudio interactively rendered and the R REPL. I have used very little R and I'm a heavy Python user so maybe I didn't know how to help her more effectively. I think I helped solving the main Python pain points: installing anaconda, vscode, the python extension and some additional auto completion. I don't use vscode (use Emacs) but it's great it's available for newbie users :p. Also, having Colab was nice for simple things.
To summarize: I think notebooks are great for newcomers. It requires more maturity to appreciate more principled programming.
Avoid if possible, is the easiest answer. Encourage your colleagues to move their code into proper packages when they're happy with it, and restrict notebooks to _use_ of their code.
Failing that, I think fast.ai's nbdev[0] is probably the most persuasive attempt at making notebooks a useable platform for library/application development. Netflix also has reported[1] substantial investment in notebooks as a development platform, and open-sourced many/most of their tools.
I've worked as a data scientist for quite awhile now in IC, lead and manager roles and the biggest thing I've found is that data scientists cannot be allowed to live exclusively in notebooks.
Notebooks are essential for the EDA and early prototyping stages but all data scientists should be enough "software engineer" to get their code out of their notebook and into a reusable library/package of tools shared with engineering.
The best teams I've worked on the hand off between DS and engineering is not a notebook, it's a pull request, with code review from engineers. Data scientists must put their models in a standard format in a library used by engineering, they must create their own unit tests, and be subject to the same code review that engineer would. This last step is important: my experience is that many data scientists, especially coming from academic research, are scared of writing real code. However after a few rounds of getting helpful feedback from engineers they quickly realize how to write code much better.
This process is also essential because if you are shipping models to production, you will encounter bugs that require a data scientist to fix that an engineer cannot solve alone. If the data scientists aren't familiar with the model part of the code base this process is a nightmare, as you have to ask them to dust of questionable notebooks from months or years ago.
There are lots of the process of shipping a model to production that data scientists don't need to worry about, but they absolutely should be working as engineers at the final stage of the hand off.
I agree with everything you said above and that is exactly how we have always had things at my place of employment (work at a small ML/Algorithm/Software development shop). That being said, the one thing I really don't understand is why Notebooks are essential even for EDA. I guess if you were doing things in Notepad++ or a pure REPL shell, they are handy, but using a powerful IDE like Pycharm makes Notebooks feel very very limiting in comparison.
Browsing code, underlying library imports and associated code, type hinting, error checking, etc., are so vastly superior in something like Pycharm that it is really hard to see why one would give it all up to work in a Notebook unless they never matured their skillsets to see the benefits afforded by a more powerful IDE? I think notebooks can have their place and are certainly great for documenting things with a mix of Markdown, LaTeX and code, as well as for tutorials that someone else can directly execute. And some of the interactive widgets can also make for nice demos when needed.
Notebooks also make for poor habits often times and as you mentioned, having data scientists and ML engineers write code as modules or commit them via pull-requests helps them grow into being better software engineers which in my experience is almost a necessity to be a good and effective data scientist and ML engineer.
And lastly, version controlling notebooks is such a nightmare. Nor is it conducive to code reviews.
There's an advantage to long-lived interpreters/REPLs on remote machines for the kind of work done in notebooks. Significant amounts of data may have to be read, expensive computation performed, etc. before the work can begin. Notebooks are an ergonomic interface to that sort of environment if one isn't comfortable with ssh/screen/X-forwarding/etc, and frankly nice for some tasks even if one is.
There's also a tacit advantage to notebooks specifically for Python as the interface encourages the user to write all of their definitions in a single namespace. So, the user can define and re-define things at their leisure within a single REPL/interpreter lifetime. A user developing against import-ed modules can quickly get stuck behind python's inability to cleanly re-import a modules, or be forced to rely on flaky hacks to the import system.
It pains me a bit to make the argument _for_ notebooks, but it's important to understand the attractions.
Thanks for sharing that perspective! It was helpful to get that POV. I agree that a requirement for long lived interpreters and a simpler UX to get up and running probably makes it an attractive option.
With VSCode having such excellent remote development capabilities now however, it feels like a nicer option these days but I guess only if you really care about the benefits that brings. Agreed about reimporting libraries still being a major pain point in Python, but the "advantage" for Jupiter Notebooks is also unfortunately what leads to terrible practices and bad engineering as most non-disciplined engineers end up treating it as one giant script for spaghetti code to get the job done.
When EDA involves rendering tables or graphics, notebooks provide a faster default feedback loop. Part of this comes from the assumption that the kernel holds state and data loading, transformations, and viz can be ran incrementally and without switching context. That's not to say that it's not possible to do with a python repl and terminal with image support, but that's essentially the value prop of notebooks. Terrible for other things like shipping code, but very good for interactive sessions like EDA work.
Personally, I find myself prototyping in notebooks and then refactoring into scripts very often and productively.
I've found myself in a data science group by merger and this(what type of artifact to ship) is a current team discussion point. Would you be willing to let me pick your brain on this topic in depth?
This is how my lab works. We do a lot of prototyping, exploring, making sure everything seems to be working, etc. and then pack it all into reasonably well documented standard code.
Learned this the hard way after working for a group for awhile with a single shared notebook I had nicknamed "The wall of madness".
Atom (editor) + Hydrogen (Atom plugin).
I like Hydrogen over more notebook-like plugins that exist for VSCode because it's nothing extra (no 'cells') beyond executing the line under your cursor/selection.
Then i just start coding, executing/testing, refactoring, moving functions to separate files, importing, call my own APIs.. rinse repeat.
I tend to maintain 3 'types' of .py files.
1. first class python modules - the refactored and nicely packaged re-usable code from all my tinkering
2. workspace files - these are my working files. I solve problems here. it gets messy, and doesn't necessarily execute top to bottom properly (i'm often highlighting a line and running just it, in the middle of the file)
3. polished workspaces - once i've solved a problem ("pull all the logs from this service and compute average latency, print a table"), i take the workspace file and turn it into a script that executes top to bottom so i can run it in any context.
This is a daily pain we've experienced while working in the industry! Our projects would usually allocate a few weeks to refactor notebooks before deployment! So we started working on an open-source framework to help us produce maintainable work from Jupyter. It allows easy git collaboration and eases deployment. https://github.com/ploomber/ploomber
I've been using ploomber for a month and so far, I really like it. The developers have been super helpful. It hits the sweet spot for writing developer-friendly, maintainable scientific code. Our data science team is looking at adopting it as our team's standard for deployments.
Admittedly, I'm one of those people. This problem also applies to the use of Excel for exploratory programming and analysis.
There are no guides that I'm aware of. Part of the reason may be a mild "culture" divide between casual and professional programmers, for lack of better terms. Any HN thread about "scientific" programming will include some comments to the effect that we should just leave programming to the pro's.
My advice is to immerse yourself in the actual work environment of the casual programmers: Observe how we work, what pressures and obstacles we face, what makes our domain unique, and so forth. Figure out what solutions work for the people in the trenches. My team hired an experienced dev, and I asked him specifically to help me with this. One thing I can say for sure is that practical measures will be incremental -- ways that we can improve our code on the fly. They will also have to recognize a vast range of skills, ranging from raw beginners to coders with decades of experience (and habits).
Jot down what you learn, and share it. I think our side of the cultural divide needs help, and would welcome some guidance.
I agree with you, having been on both sides of the divide and researched & written my masters thesis on teaching programming to undergrad science students.
Are you aware of https://software-carpentry.org/? It started after I graduated and I knew people who were involved with it at the time. It seemed like a good idea.
It looks like I didn't put it on Arxiv, so I need to find a copy and then put it back online :) Will reply here when I do, but likely to be a week+ before I do
There’s nothing wrong with excel (as long as you stay below the 64k limit). People use it because it works. That is almost tautologically close to whatever it is that software aspires to.
Excel has gotten more people to write code than all other programming environments together. And they’ve often enjoyed doing it. It’s a fantastic success story.
- When turning notebooks into more user-facing prototypes, I've found Streamlit is excellent and easy-to-use. Some of these prototypes have stuck around as Streamlit apps when there's 1-3 users who need to use them regularly.
- Moving to full-blown apps is much tougher and time-consuming.
This is a great insight! I think parameterizing the notebooks is part of the solution, moving to production shouldn't be time-consuming and definitely no need to refactor the code like I've seen some people do. I'd love to get your feedback. We're building a framework to help people develop maintainable work from Jupyter! https://github.com/ploomber/ploomber
First, yes, this is a common question. IPython does not try to deal with that, it's just the executing engine.
Notebooks, do not have to be stored in ipynb form, I would suggest to look at https://github.com/mwouts/jupytext, and notebook UI is inherently not design for multi-file and application developpement. So training humans will always be necessary.
Technically Jupyter Notebook does not even care that notebooks are files, you could save then using say postgres (https://github.com/quantopian/pgcontents) , and even sync content between notebooks.
I'm not too well informed anymore on this particular topic, but there are other folks at https://www.quansight.com/ that might be more aware, you can also ask on discourse.jupyter.org, I'm pretty sure you can find threads on those issues.
I think on the Jupyter side we could do a better job curating and exposing many tools to help with that, but there are just so many hours in the day...
I also recommend I don't like notebook from Joel Grus, https://www.youtube.com/watch?v=7jiPeIFXb6U it's a really funny talk, a lot of the points are IMHO invalid as Joel is misinformed on how things can be configured, but still a great watch.
I see where you're coming from. From where you sit Jupyter is a language agnostic tool and so in. But the fact that there's dozens of solutions in this space is surely a problem?
I'd have thought there would be some things you could strongly encourage:
1. Come up with some standard format where the code and the data live in separate files.
2. Come up with some standard format where you can take load a regular .py script as a cell based notebook using metadata comments (and save it again).
If these came out of the box it would solve most of the issues.
Funny you should ask. I just wrote a book called Effective Pandas[0] that discusses ways to use pandas (in Jupyter) that leads to easy re-use, sharing, production, testing. Here's a video with many of the ideas if you prefer [1].
People tend to have strong feeling when they see my pandas code as it is different from much of the (bad advice) in the Medium echo chamber. Generally, most who try it out are very happy.
The basics are embrace chaining, avoid .apply, and organize notebooks with functions (using the chain).
Oh, and Jupytext is a life saver if you are someone who uses source control.
The whole point of notebooks is to focus only on exploration of data, making some nice plots, adding some explanatory text, and NEVER think about software engineering.
A decent data scientist who also understands software engineering will sooner or later take the prototype code from the notebook and refactor it into proper modules. Either this or the notebook will become an unrunnable mess as it is developed further. Reusing code and functions in a grown notebook is just too fragile.
I'm working on a solution that helps with transforming notebooks into web applications (with GUI). You just need to define YAML config (similar to R Markdown) and the framework will generate web app with interactive widgets. After change in widgets, user clicks Run button and the whole notebook is executed, converted to HTML and displayed to the user.
The problems you mention are solved by auxiliary tools in the notebook ecosystem.
- Look at nbdime & ReviewNB for git diffs
- Checkout treon & nbdev for testing
- See jupytext for keeping .py & .ipynb in sync
I agree it's a bit of a pain to install & configure a bunch of auxiliary tools but once set up properly they do solve most of the issues in the Jupyter notebook workflow.
It is only a plan only (partially implemented). I am separating code to clean and ad-hoc. Clean code is "supported" - maintained (jobs monitored/failures handled/bug fixed) by more professional developers, if somebody what to have a custom job, they more or less on their own.
When I am asked to fix problem in such "custom" job, first thing I do is refactoring code to follow standards (configuration, hardcoded paths and values, logging, alert notification to predefined list of people related to project, handling recovery, etc.), than it becomes a part of main pool - "maintained code".
In VS code, .py file can work like a notebook. VS Code treats #%% as start of a cell, while being a plain comment when running the it as .py file. VS code can also convert an existing jupyter notebook to .py with this format
Instead of looking for a quick 1:1 conversation from notebook --> app, it should be a line by line re-creation using a notebook as more of a whiteboard.
This approach while much slower limits errors and ensures sustainability because both the notebook creator and the app creator will know what's going on.
I think solutions like papermill and others only work when you have infinite money and time.
I agree with the idea of using it as a whiteboard - when I need to do casual programming and data analysis for my non-software job I tend to work it out in a notebook first, then start combining all the smaller cells into larger chunks that I then can move into a proper python script.
This is a fundamental problem for me too. No source control, no tests, hard to extract into libraries. I'm surprised there isn't a better tool already.
if you are "cutting and pasting from the notebook into a .py file" you should look at `jupyter nbconvert` on the CLI.
I think there's ways to feed it a template that basically metaprograms what you want the output .py file to look like (e.g. render markdown cells as comments, vs. just removing them), but I've never quite figured that out.
Excited about this and have been a big fan of iPython since when I started coding in Python well over a decade ago. Might be a little while for switching to v8 due to the minimum 3.8+ version requirement for Python (which I totally understand from an ease of maintenance and forward looking mindset for the iPython projet). I use it all the time as my go-to REPL in Pycharm. Thanks to the team for all the work on it!
As an aside, I really wish the VSCode team did more to integrate iPython REPL more seamlessly into VSCode as that is one of the big blockers for me to using VSCode for anything Python related.
Thanks, in particular for your understanding with Python 3.8. It's in huge part to give a signal to businesses that that can/should move forward, and to give reasons for "smaller" projects with fewer devs to also remove support for older Python which can be a burden to maintain.
I don't use VS Code myself, but I think the team is doing in increasingly better job, Microsoft is just a huge beast. I would also love for some IPython feature to get into Core Python. But that might just take time as I don't think many Core Python Dev do that much interactive coding, and thus don't see that much the interest of doing so.
BTW it's uppercase I and P, we don't want to be in trouble with a billion dollar fruit company, even if we predate their use of iPxxxx
IPython (terminal repl) with autoreload has been God mode for me for about 10 years now. No other environment even comes close when it comes to exploring data, sketching out code and hacking towards a solution. And once you get most of the way there, stick stuff in a file and work with vim while IPython silently and reliably hot reloads all the code without losing the data you have loaded in your objects. It’s an absolute pleasure to use.
Thanks for your work on it, it really is much appreciated.
Agreed. I have exactly the same workflow except I end up using Pycharm with the interactive IPython REPL that it integrates with and graduate code up into modules in PyCharm. I like the variable viewer in PyCharm as that ends up being really handy when prototyping things in the REPL especially when working with data and writing algorithms, etc. It really does feel like "cheating"!
In solidarity with sibling comments, I also want to say that IPython has been pretty much my default shell for about a decade now. Auto completion, magic functions (paste, edit, pylab come to mind), auto reload, nice colors out of the box... it’s become a beloved piece of software to me over the years. Back in college, being introduced to IPython almost made it seem like Matlab wasn’t doing enough! I’m gushing a bit, but honestly IPython has solved many a problem for me, and I want to say thanks.
> As an aside, I really wish the VSCode team did more to integrate iPython REPL more seamlessly into VSCode as that is one of the big blockers for me to using VSCode for anything Python related.
VS recently made big changes to notebooks support [1], and they are now fully integrated into VS with their own Notebooks API. I've been following the changes for the past year on VS Code Insiders and the latest integration is really impressive from a UI and developer point of view. What's more is VS Code lets you easily use notebooks with any language (not just Python). I've had a really good experience so far using Julia kernels.
Thanks for the reply. Unfortunately, I am not looking for a Notebook experience but rather for integration of the IPython REPL shell as the default shell in VSCode so all code is being executed in there and so you can interactively prototype or debug code in an IPython shell. It is insanely more capable and powerful than any other regular Python shell, and without it, VSCode just feels a lot more gimped for Python development. Pycharm otoh is an example of an IDE that absolutely nails IPython shell integration into the IDE.
My default way of running snippets of Python code in VS Code is selecting code, and pressing shift+enter (mac) to execute the Python in an interactive window, which (I think?) is an IPython REPL.
Does the PyCharm integration offer something more capable? I'm interested in what I'm missing out on!
I don't believe the interactive REPL in VSCode is IPython. It is just a regular Python REPL from what I have seen. Some of the really nice advantages of the IPython REPL integration in Pycharm (along with a lot of extra legwork and features added by Jetbrains) are:
1. Multi-line text support and auto indentation support. This is *huge*. Most python REPLs are terrible at this including the default interpreter. You can easily copy paste code from scripts/modules into the REPL and the interpreter just handles everything seamlessly. It is even smart enough to remove a global indent across all the pasted code (if you copied code from within a function that was already indented 1 level up). It makes the REPL experience really really smooth.
2. Tab completion works beautifully with hover overlays
3. The integrated variable viewer is extremely good and you can easily view the local state of your interpreter and explore data/variables. The integrated Pandas dataframe and Numpy array viewers (available even in the free version) are really handy as well.
4. You can even attach a debugger to an interactive REPL session and if you then have breakpoints defined in associated libraries in your Pycharm IDE and then invoke code that would hit the breakpoint, it will pause at the breakpoint and give you the full debugging experience. This is really handy for reducing the time to debugging and investigating issues in code.
5. Matplotlib eventloop is handled very well in Pycharm which basically means that interactively plotting in the IPython REPL using matplotlib works seamlessly.
6. You also get some amount of linting/error checking in the REPL and also syntax highlighting, which is really helpful as well.
7. The IPython interpreter *is the default interpreter* which means that even when you debug code (with breakpoints for example), you get all of the benefits above while debugging, which is a really nice experience, especially with having access to the variable viewer.
8. Another annoyance I had with VSCode last time I tried using it is that the debugger while vastly improved still only allowed single line of code entry and was generally clunky if you wanted to paste multiple lines of code into the debugging REPL. Since you get the full IPython shell in Pycharm at all times (debugging or otherwise), it ends up being a lot more powerful and easier to use.
9. This is underrated, but Pycharm actually has a button that displays a log of all your code entries into your REPL. This is really handy in my experience as you can prototype code in the interpreter with working data/state, validate that it works right and then grab it from that window, copy it, and then paste it into a script/module to "graduate" it to more matured code.
That's what I could muster up off the top of my head. Pycharm in general has a ton of other nice things going for it, but ultimately, it is the really smooth REPL experience and how well integrated the shell is with the IDE that makes it my go-to IDE for anything Python.
Given what you've said, I think the interactive window can be iPython in VS Code. This may be a relatively recent development.
I'm not sure what I did to enable this, but it may be because I have the jupyter extension for VS code installed.
For instance:
- If I do `df.plot()`, I get a matplotlib image displayed directly in the interactive window
- Auto indentation works
- Tab completion works
- Commands like %load_ext autoreload work.
It sounds like the integration isn't as tight as with PyCharm. For example, you mention having ipython available in a debugging session. That's one thing that I haven't managed to get working in VS Code.
100% agree on the vscode part. I like vscode a lot for various reasons and use it for various programming languages including Python, but the debugging console is such a pain to use. Would love an IPython integration.
One thing I've done to provide an enhanced debug experience to debugging in VScode. From the debugging console run:
from IPython import embed; embed()
This will open iPython in the terminal window with the state of your program at the debug point loaded in. You do need to "quit()" it before moving on in the debugger though.
I was using IPython to develop code by inserting the REPL to the right point in my project. But more recently I got used to VSCode and enjoy its ability to jump around in the stack and use the debug window in context. But the experience of editing code in the debug window is much inferior to IPython. It should be a regular editor and Tab should just insert a Tab, if I select something from the file I should be able to send it to the debug editor. It even handicaps the use of arrows, you need to do Shift+Up and Shift+Down if you have more than one line. Also, the debugger is slow, especially when showing a Pandas dataframe. I can still invoke IPython embed from the debug window.
You should really give Pycharm (free community edition works just fine) a shot. It has a bit of a steep learning curve getting set up, but it is everything you want both from an IDE and seamless integration with an IPython shell (you can toggle a setting to make it use IPython shells if IPython is installed in the project venv)
We actually took some of the best of both world, interactivity via Jupyter and the IDE strength, make sure to check out this: https://ploomber.io/blog/vscode-pycharm/
Jupyter notebooks are a favorite among our data scientists. However, we have gone back to plain python scripts for our bigger projects due to a simple reason - one must keep alive the notebook page while running lengthy experiments on a remote server. Due to some rogue windows updates we had a couple of destroyed experiments, which (as these things go), happened at a very inopportune moment.
OTOH for quick experiments notebooks are great, although I feel like the more modern the GUI the farther back we go in terms of experience. The latest updates to visual studio code's Jupyter extension for example have turned this into a thoroughly frustrating experience for the visually impaired - gray-on-gray-on-gray text and even more gray and transparent thin lines that are supposed to clearly mark where a cell ends and where the output begins. Unfortunately no amount of fiddling with the color scheme could fix these 'design' choices...
>However, we have gone back to plain python scripts for our bigger projects due to a simple reason - one must keep alive the notebook page while running lengthy experiments on a remote server.
Known issue (it's a six year old issue IIRC). They're working on it if I'm not mistaken. They're also working on real-time collaboration.
Plug: We have long-running notebook scheduling in the background and the output is streamed and saved whether you close your browser or visit from another device.
We run the notebooks on your own Kubernetes cluster on GCP's GKE, AWS' EKS, Azure's AKS, DigitalOcean, and pretty much anything.
The run saves everything as an experiment: it automatically detects model parameters without tagging cells, tinkering with metadata, or you calling a tracking library. We also automatically detect the model that is produced, and the model's metrics (again, without you doing anything).
> Due to some rogue windows updates we had a couple of destroyed experiments, which (as these things go), happened at a very inopportune moment.
PSA: on all process control equipment running Windows 10, install O&O shutup10 and enable the default set of disablements. Finding out that an incubator has been sitting there baking $300,000 of Andor cameras for 61 hours while the organism library died because the Windows 10 box running the Python control stack decided to update: it’s a bad time. https://www.oo-software.com/en/shutup10
Just wanted to say Thanks for this amazing project! IlBeen using it for years now and for simple debugging and peeking into data files, nothing beats IPython in convenience!
Thanks ! I invite you to also read the 7.x what's new as well. The debugger got a few improvements sponsored by D.E Shaw group (hiding and skipping frames for example). Hope to have you contribute some code at some point, if not already.
I'm going to break the HN rule of comments not having meaningful content just to say: thanks for this work. IPython was what I used when I started programming for "fun"; its so easy and helpful for beginners. I'm glad to see its still actively developed and has expanded in scope so much (initially it was mostly used by the academic community before expanding into data science).
> Python has multiline strings with triple backticks
I think this should say "quote marks" instead of "backticks" since backticks are a different char, Python strings use single- or double-quote char, and three of them delimits a multiline string.
Thank you very much for your efforts! I haven't seen this mentioned in the release notes, but does this fix the remaining automatic module reload issues? Do I still have to restart IPython whenever I modify a module?
There is the %autoreload magic but it is limited, it will often fail to reload compiled modules like numpy. So there is not a single answer, sometime it works, other times it does not.
Also chiming in to say thanks for the good work! This looks like an amazing release - I practically jumped in excitement when I saw the fish style autocomplete.
Yes. Good things here in the big list [1]. Fish style auto completion and traceback improvements are especially welcome. I find myself reaching for IPython more readily than browser Jupyter. REPL just feels better for control, though you can't beat browser for visuals.
In that vein I have probably somewhat obscure question, but since OP is here I thought I'd give it a shot. I'd like to use Unix shell in concert with IPython. I'd send data to IPython kernel from zsh terminal sessions and call functions get data back. This data I could then send to Visidata or browser for bespoke visualization. Or whatever else is available in the shell. I think Jupyter's messaging protocol kind of allows this, but I haven't managed to grasp the fine details enough to get anywhere. I can get to shell from IPython, but from the outside this REPL isn't accessible from the Unix "REPL".
Thank you. Calling to shell isn’t really what I’m thinking of doing. But I have used it many times in the past for great effect. I think I have your book’s first edition! Thank you for the link. And the updated edition! I’ll have to check what’s changed.
You likely want to use something like https://github.com/jupyter/jupyter_console, or ipykernel directly to have a persistent python process. One issue is that shells are text based, so you have to do a lot of serialisation/deserialisation.
But honestly at that point I would just look into https://xon.sh/ that blends Python and Shell together. IPython and Xonsh devs are friends, so if you need anything from one into the other it's likely doable.
Thing is, I like my zsh. Sometimes I use Nu shell as well, which does structured data better. Serialization I think needs to happen anyway to get data into IPython from outside. I have tried Xonsh once or twice, but it was too large change. Unix shell does some things very well, Python I like to use for other things. It is the crossover I’d like to smooth out.
I am confused on what having a persistent Python process means in this context. Isn’t IPython already that? Jupyter console states it’s a single process IPython terminal. That does leave me wondering what is different when I start IPython vs Jupyter console. I might have assumed years ago that they are mutually exclusive…
Well, jupyter_console itself is a frontend/CLI, it starts another process (ipykernel) and communicate with it using ZMQ. IPykernel itself use IPython as a library to execute code.
It might be easier to send code to an ipykernel started with jupyter_console, as it already has an eventloop and is listening to sockets, that trying to talk to an IPython. Plus using ZMQ you'll have richer messages than just text.
Thanks for the release. I just upgraded and error messages are really nice. One bit I might be missing is that Python 3.10 added suggestions for AttributeError and NameError. It seems the suggestions are not stored in the exception object but calculated when Error is displayed. There is a note that that this won't work with IPython but it will be good to check if it's feasible.
It's probably feasible, I need to look into how the suggestion is stored and display it. You seem to have looked into it more than I, do you want to open an issue with your thoughts ?
I really enjoy bpython as my go-to python repl. Its django support and autocomplete is out of this world and the way it displays doc sections while using functions is really useful. Will take a look and compare ipython 8.0
Yes, bpython is good. I have plans to make the documentation better (https://github.com/jupyter/papyri) but so far I only have a few hours per week I can spend on IPython. Jedi from david halter should also get some love for better completion.
I’ve been using ptpython for a while because of its autocompletions. Really excited for that to come to ipython, but from a quick comparison ptpython’s still seems a bit better. You get automatic dropdowns where you have to hit tab in ipython (though it does seem faster than it used to be), and ptpython favors autocompleting arguments in the function signature.
IPython is more robust in various ways than ptpython so I’d prefer to switch back but maybe it still needs a bit of improvement. Open to suggestions if there is configuration I’m missing.
Seems like a great release though with tons of code cleanup.
Yeah, mostly I lack time to catch up with Jonathan Slenders works, and have stronger backward compatibility requirements. b=But ptpython and pyipython are both great.
Any recommended workflow of integrating IPython and vim in 2022 preferably being able to edit the notebooks, execute cells etc.? Currently this is one of the few reasons why I’m running a full-blown IDE with Jupyter integration. A mature plugin for integration having similar qualities to, say, Fugitive, would be vim users dream I suppose.
With vim and the qtconsole side by side you can send lines and selections (or entire cells delimited with #%%) to execute in the qtconsole. Plots appear in the qtconsole.
No problem with Black per se - it's my default linter - but latest inclusion with IPython 8.0 seems to break in a Docker environment:
Generating grammar tables from /usr/local/lib/python3.10/site-packages/blib2to3/Grammar.txt
Writing grammar tables to /root/.cache/black/21.12b0/Grammar3.10.1.final.0.pickle
Writing failed: [Errno 2] No such file or directory: '/root/.cache/black/21.12b0/tmpx51kjom5'
Generating grammar tables from /usr/local/lib/python3.10/site-packages/blib2to3/PatternGrammar.txt
Writing grammar tables to /root/.cache/black/21.12b0/PatternGrammar3.10.1.final.0.pickle
Writing failed: [Errno 2] No such file or directory: '/root/.cache/black/21.12b0/tmp80hsbuff
AST in a more general concept see https://en.wikipedia.org/wiki/Abstract_syntax_tree, it basically turn your text into a tree where f(a) + f(b) is `call(plus, call(f, a), call(f,b))`. Using https://github.com/alexmojaki/stack_data we can say "the error occurred in `a`, while trying to call `f`, while calling `plus`, get the range in the original text and make them yellow.
In particular:
* Notebooks store code and data together, which is very messy if you want to look at [only] code history in git. * It's hard to turn a notebook into an assertive test. * Converting a notebook function into a python module basically involves cutting and pasting from the notebook into a .py file.
These must be common issues for anyone working in this area. Are there any guides on best practices for bridging from notebooks to applications?
Ideally I'd want to build a python application that's managed via git, but some modules / functions are lifted exactly from notebooks.