How do I append one string to another in Python?
Hey Sndhu,
I hope you are doing well, this is the answer to the question:
When you have a single reference to a string in CPython and you concatenate another string to it, CPython optimizes this by attempting to extend the string in place. This results in the operation being amortized O(n).
For example:
s = ""
for i in range(n):
s += str(i)
This used to be O(n^2) but is now O(n).
From the source code (bytesobject.c):
void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
PyBytes_Concat(pv, w);
Py_XDECREF(w);
}
/* The following function breaks the notion that strings are immutable:
it changes the size of a string. We get away with this only if there
is only one module referencing the object. You can also think of it
as creating a new string object and destroying the old one, only
more efficiently. In any case, don't use this if the string may
already be known to some other part of the code...
Note that if there's not enough memory to resize the string, the original
string object at *pv is deallocated, *pv is set to NULL, an "out of
memory" exception is set, and -1 is returned. Else (on success) 0 is
returned, and the value in *pv may or may not be the same as on input.
As always, an extra byte is allocated for a trailing \0 byte (newsize
does *not* include that), and a trailing \0 byte is stored.
*/
int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
register PyObject *v;
register PyBytesObject *sv;
v = *pv;
if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
*pv = 0;
Py_DECREF(v);
PyErr_BadInternalCall();
return -1;
}
_Py_DEC_REFTOTAL;
_Py_ForgetReference(v);
*pv = (PyObject *)
PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
if (*pv == NULL) {
PyObject_Del(v);
PyErr_NoMemory();
return -1;
}
_Py_NewReference(*pv);
sv = (PyBytesObject *) *pv;
Py_SIZE(sv) = newsize;
sv->ob_sval[newsize] = '\0';
sv->ob_shash = -1; /* invalidate cached hash value */
return 0;
}
This can be verified empirically:
$ python -m timeit -s"s=''" "for i in range(10):s+='a'"
1000000 loops, best of 3: 1.85 usec per loop
$ python -m timeit -s"s=''" "for i in range(100):s+='a'"
10000 loops, best of 3: 16.8 usec per loop
$ python -m timeit -s"s=''" "for i in range(1000):s+='a'"
10000 loops, best of 3: 158 usec per loop
$ python -m timeit -s"s=''" "for i in range(10000):s+='a'"
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s"s=''" "for i in range(100000):s+='a'"
10 loops, best of 3: 14.6 msec per loop
$ python -m timeit -s"s=''" "for i in range(1000000):s+='a'"
10 loops, best of 3: 173 msec per loop
However, this optimization is specific to the CPython implementation. Other Python implementations like PyPy or Jython may still exhibit the older O(n^2) performance.
For example, empirical testing on PyPy:
$ pypy -m timeit -s"s=''" "for i in range(10):s+='a'"
10000 loops, best of 3: 90.8 usec per loop
$ pypy -m timeit -s"s=''" "for i in range(100):s+='a'"
1000 loops, best of 3: 896 usec per loop
$ pypy -m timeit -s"s=''" "for i in range(1000):s+='a'"
100 loops, best of 3: 9.03 msec per loop
$ pypy -m timeit -s"s=''" "for i in range(10000):s+='a'"
10 loops, best of 3: 89.5 msec per loop
$ pypy -m timeit -s"s=''" "for i in range(100000):s+='a'"
10 loops, best of 3: 12.8 sec per loop
This shows that PyPy performs well with short strings but poorly with larger strings.
Hey Sndhu,
Python 3.6 introduced f-strings, making string formatting much more enjoyable:
var1 = "foo"
var2 = "bar"
var3 = f"{var1}{var2}"
print(var3) # prints foobar
You can include almost any expression inside the curly braces:
print(f"1 + 1 == {1 + 1}") # prints 1 + 1 == 2
Hey Sndhu,
You can concatenate strings using the add
function:
str1 = "Hello"
str2 = " World"
str3 = str1.__add__(str2)
print(str3)
Output:
Hello World